This job offer is not available in your country.

Engineer, Site Reliability [T500-20503]

ANSRHyderabad, Telangana, India

7 days ago

Job description

ANSR is hiring for one of its clients.

About T-Mobile :

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

About TMUS Global Solutions :

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited is a subsidiary of T-Mobile US, Inc. and operates as TMUS Global Solutions.

About the Role :

The Site Reliability Engineer ensures digital systems are reliable, resilient, and scalable. This role automates operational processes, reduces manual intervention, and strengthens incident response across complex environments. With expertise in infrastructure, scripting, cloud services, and observability, the Site Reliability Engineer is essential to maintaining system uptime and delivering continuous improvements in performance and deployment workflows.

What You’ll Do :

Automate processes to enhance system reliability and scalability
Implement proactive monitoring and maintenance to prevent incidents
Streamline CI / CD and development-to-deployment workflows
Develop tools and scripts that reduce manual operational efforts
Respond to incidents, manage root cause analysis, and minimize service disruption
Continuously research and adopt new technologies for performance gains
Partner with cross-functional teams to improve end-to-end system performance
Support other duties and technical projects as required by leadership

What You’ll Bring :

Bachelor’s degree in Computer Science, Software Engineering, or a related technical field

2–5 years of experience in SRE, DevOps, or cloud-native infrastructure roles

Proven ability to build and manage CI / CD pipelines

Experience with cloud-native platforms and technologies (e.g., AWS, Azure, GCP)

Strong scripting skills (e.g., Python, Bash) and systems troubleshooting

Knowledge of Agile principles and automation best practices

Excellent problem-solving and communication skills

Certifications (preferred) : CKA, AWS DevOps Engineer, SRE Foundation

Must Have Skills :

Hands-on experience with Monitoring & Observability tools such as Prometheus, Grafana, Dynatrace, Datadog, or Splunk for system health, application metrics, and alerting.

Strong expertise in automation scripting using Python, Bash, or Perl to streamline operational workflows and reduce manual intervention.

Practical experience with DevOps practices and CI / CD tools like Jenkins, GitLab CI , ensuring automated and reliable software delivery pipelines.

Proficiency in Infrastructure as Code (IaC) using tools such as Terraform, AWS CloudFormation, or Pulumi for scalable cloud provisioning.

Solid understanding of cloud platforms — AWS or Azure —with experience managing cloud-native applications and infrastructure.

Deep knowledge of containerization and orchestration using Docker and Kubernetes , including managing deployments and scaling services.

Exposure to performance tuning and optimization , especially at the API layer , to improve system responsiveness and reliability.

Nice To Have :

Experience with chaos engineering or resilience testing

Familiarity with service mesh (Istio, Linkerd), edge proxies, or policy engines.

Exposure to SRE metrics (SLOs, SLIs, Error Budgets) and golden signals monitoring.

Create a job alert for this search

Site Reliability Engineer • Hyderabad, Telangana, India