Talent.com
This job offer is not available in your country.
Engineer Site Reliability

Engineer Site Reliability

TMUS Global SolutionsHyderabad, India
11 days ago
Job description

About the Role

The Site Reliability Engineer ensures digital systems are reliable, resilient, and scalable. This role automates operational processes, reduces manual intervention, and strengthens incident response across complex environments. With expertise in infrastructure, scripting, cloud services, and observability, the Site Reliability Engineer plays a key role in maintaining system uptime and driving continuous improvements in performance and deployment workflows.

What Youll Do

  • Automate processes to enhance system reliability and scalability
  • Implement proactive monitoring and maintenance to prevent incidents
  • Streamline CI / CD and development-to-deployment workflows
  • Develop tools and scripts that reduce manual operational efforts
  • Respond to incidents, manage root cause analysis, and minimize service disruption
  • Continuously research and adopt new technologies for performance gains
  • Partner with cross-functional teams to improve end-to-end system performance
  • Support other duties and technical projects as required by leadership

What Youll Bring

  • Bachelors degree in Computer Science, Software Engineering, or a related technical field
  • 25 years of experience in SRE, DevOps, or cloud-native infrastructure roles
  • Proven ability to build and manage CI / CD pipelines
  • Experience with cloud-native platforms and technologies (e.g., AWS, Azure, GCP)
  • Strong scripting skills (e.g., Python, Bash) and systems troubleshooting
  • Knowledge of Agile principles and automation best practices
  • Excellent problem-solving and communication skills
  • Certifications (preferred) : CKA, AWS DevOps Engineer, SRE Foundation
  • Must Have Skills

  • Programming Languages : Proficiency in at least one Python, Java, or JavaScript
  • Cloud Platforms : Experience with any major cloud provider AWS, Azure, or GCP
  • Infrastructure as Code (IaC) : Hands-on experience with tools like Terraform, CloudFormation, or Pulumi
  • CI / CD Pipelines : Familiarity with tools such as GitHub Actions, GitLab CI, Jenkins, or Argo CD
  • Containerization : Experience with Docker and orchestration tools like Kubernetes
  • Observability & Monitoring : Knowledge of tools such as Prometheus, Grafana, Splunk, CloudWatch, or Datadog
  • Nice To Have

  • Experience with chaos engineering or resilience testing
  • Familiarity with service mesh (Istio, Linkerd), edge proxies, or policy engines
  • Exposure to SRE metrics (SLOs, SLIs, Error Budgets) and golden signals monitoring.
  • Create a job alert for this search

    Site Reliability Engineer • Hyderabad, India