This job offer is not available in your country.

Engineer Site Reliability

TMUS Global SolutionsHyderabad, India

11 days ago

Job description

About the Role

The Site Reliability Engineer ensures digital systems are reliable, resilient, and scalable. This role automates operational processes, reduces manual intervention, and strengthens incident response across complex environments. With expertise in infrastructure, scripting, cloud services, and observability, the Site Reliability Engineer plays a key role in maintaining system uptime and driving continuous improvements in performance and deployment workflows.

What Youll Do

Automate processes to enhance system reliability and scalability
Implement proactive monitoring and maintenance to prevent incidents
Streamline CI / CD and development-to-deployment workflows
Develop tools and scripts that reduce manual operational efforts
Respond to incidents, manage root cause analysis, and minimize service disruption
Continuously research and adopt new technologies for performance gains
Partner with cross-functional teams to improve end-to-end system performance
Support other duties and technical projects as required by leadership

What Youll Bring

Bachelors degree in Computer Science, Software Engineering, or a related technical field

25 years of experience in SRE, DevOps, or cloud-native infrastructure roles

Proven ability to build and manage CI / CD pipelines

Experience with cloud-native platforms and technologies (e.g., AWS, Azure, GCP)

Strong scripting skills (e.g., Python, Bash) and systems troubleshooting

Knowledge of Agile principles and automation best practices

Excellent problem-solving and communication skills

Certifications (preferred) : CKA, AWS DevOps Engineer, SRE Foundation

Must Have Skills

Programming Languages : Proficiency in at least one Python, Java, or JavaScript

Cloud Platforms : Experience with any major cloud provider AWS, Azure, or GCP

Infrastructure as Code (IaC) : Hands-on experience with tools like Terraform, CloudFormation, or Pulumi

CI / CD Pipelines : Familiarity with tools such as GitHub Actions, GitLab CI, Jenkins, or Argo CD

Containerization : Experience with Docker and orchestration tools like Kubernetes

Observability & Monitoring : Knowledge of tools such as Prometheus, Grafana, Splunk, CloudWatch, or Datadog

Nice To Have

Experience with chaos engineering or resilience testing

Familiarity with service mesh (Istio, Linkerd), edge proxies, or policy engines

Exposure to SRE metrics (SLOs, SLIs, Error Budgets) and golden signals monitoring.

Create a job alert for this search

Site Reliability Engineer • Hyderabad, India