Description : About Miratech :
Miratech helps visionaries change the world.
We are a global IT services and consulting company that bridges enterprise and startup innovation. Since 1989, Miratech has been driving digital transformation for some of the worlds largest enterprises. With coverage across five continents and operations in more than 25 countries, our 1,000+ professionals deliver technology solutions that help businesses innovate, scale, and thrive.
Our values-driven culture of Relentless Performance has enabled over 99% of Miratechs engagements to meet or exceed client expectations in scope, schedule, and budget. With an annual growth rate of over 25%, we are proud to stand at the intersection of global expertise, cutting-edge technology, and human-driven innovation.
About the Role :
Join us in revolutionizing customer experiences with our client a global leader in cloud contact center software.
As a Senior Site Reliability Engineer (SRE), you will play a critical role in enhancing observability, automation, and monitoring across a highly distributed, microservices-based architecture. Your primary focus will be building scalable, automated, and unified monitoring systems that provide clear visibility into system health, enabling faster detection and resolution of performance issues.
This role is more focused on observability and operational automation rather than direct application development. You will collaborate with cross-functional teams, create robust dashboards, optimize metrics and logging pipelines, and ensure that all service owners have access to actionable insights to maintain system stability and performance.
Key Responsibilities :
Monitoring & Observability :
- Design, build, and automate comprehensive monitoring and alerting systems across 30+ microservices.
- Implement consistent observability standards, ensuring all metrics and dashboards have a unified structure and visual identity.
- Work with existing tools such as New Relic and Google Cloud Operations Suite, and drive the migration to Prometheus and Grafana for improved flexibility and control.
- Monitor system health across SQL and NoSQL databases and ensure database performance metrics are integrated into dashboards.
Automation & Infrastructure as Code (IaC) :
Automate dashboard provisioning, configuration, and deployment using Infrastructure-as-Code (IaC) tools like Terraform.Automate data collection, processing, and visualization pipelines to ensure consistent, real-time observability.Develop automation scripts in Python and Shell to enhance reliability, streamline monitoring workflows, and reduce manual intervention.Incident Management & Troubleshooting :
Proactively identify anomalies, performance bottlenecks, or degradation trends through observability tools.Assist in troubleshooting production issues by analyzing dashboards, logs, and metrics, and collaborating with engineering teams to pinpoint root causes (without modifying application code).Provide actionable insights to development teams to improve reliability and prevent recurring incidents.Collaboration & Continuous Improvement :
Partner closely with service owners, DevOps, and platform engineers to define observability requirements and build effective monitoring strategies.Contribute to internal documentation, best practices, and knowledge-sharing initiatives.Participate in on-call rotations for critical systems and ensure effective escalation handling.Advocate for SRE principles such as SLIs, SLOs, and error budgets, fostering a culture of reliability and :Required Skills and Experience :
5+ years of experience as a Site Reliability Engineer, DevOps Engineer, or in a similar infrastructure-focused role.Strong hands-on experience with Kubernetes and Terraform in production environments.Proven expertise in observability platforms, particularly Prometheus and Grafana (experience with New Relic or GCP Monitoring is a plus).Proficiency in Python and Shell scripting for automation, system monitoring, and configuration management.Solid understanding of Linux system internals, including resource management, performance tuning, and troubleshooting.Practical experience with SQL and NoSQL databases, including performance monitoring and query optimization.Familiarity with cloud environments such as AWS, Azure, or Google Cloud Platform (GCP).Strong understanding of load balancing, networking fundamentals, and cloud infrastructure stacks.Nice to Have :
Experience designing or optimizing Big Data queries and integrating them into observability dashboards.Working knowledge of incident management frameworks, SRE best practices, and reliability engineering concepts (SLIs, SLOs, error budgets).Exposure to infrastructure monitoring at scale in multi-tenant or SaaS environments.What We Offer :
Culture of Relentless Performance : Join an award-winning technology organization with a 99% project success rate and 30%+ year-over-year growth.Competitive Pay and Benefits : Comprehensive compensation package including health insurance, language courses, and relocation assistance (where applicable).Work-from-Anywhere Flexibility : Enjoy remote or hybrid work options to support a healthy work-life balance.Career Growth & Learning : Access certification programs, mentorship opportunities, internal mobility, and talent development initiatives.Global Impact : Collaborate on transformative projects with leading global clients shaping the future of cloud technology.Inclusive & Multicultural Environment : Thrive in a diverse, global team with open communication, team-building activities, and cross-cultural collaboration.Commitment to Sustainability : Be part of Miratechs social sustainability mission focused on IT education, community empowerment, fair practices, environmental stewardship, and gender equality.Equal Opportunity Statement :
Miratech is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based on race, color, religion, gender, sexual orientation, national origin, age, disability, veteran status, or any other legally protected status.
(ref : hirist.tech)