This job offer is not available in your country.

Sr Engineer, Site Reliability [T500-20286]

ANSRHyderabad, India

5 days ago

Job description

About T-Mobile

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited is a subsidiary of T-Mobile US, Inc. and operates as TMUS Global Solutions.

About the Role

As a Senior Site Reliability Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will play a pivotal role in building and scaling intelligent infrastructure to support AI / ML applications, enterprise services, and LLM-based platforms. You will contribute to the design and implementation of observability frameworks, automation-first operations, and incident response strategies to ensure reliability, performance, and scalability across production systems.

What You’ll Do

Implement and maintain observability, monitoring, and alerting systems for AI platforms and backend services

Design and support telemetry pipelines, logging infrastructure, and dashboards (Splunk, Prometheus, Grafana, Open Telemetry)

Define and monitor SLOs, SLIs, latency, availability, and throughput metrics

Participate in on-call rotations, incident resolution, root cause analysis, and postmortems

Improve CI / CD workflows and infrastructure automation using GitLab pipelines

Optimize and scale infrastructure including Kafka, RMQ, HAProxy, and distributed APIs

Collaborate with engineering teams on governance, compliance, and secure operations

Support capacity planning, cost analysis, and tuning for high-scale performance

Automate repetitive tasks and reduce toil via scripting (Python, Bash, Java)

Contribute to runbooks, knowledge base articles, and SRE best practice documentation

Mentor junior engineers and support a culture of operational excellence and reliability

What You’ll Bring

Bachelor’s degree in Computer Science, Engineering, or a related technical field

4-7 years in SRE, DevOps, platform, or operations engineering roles

Strong hands-on experience in observability, monitoring, and distributed systems troubleshooting

Proficiency in scripting languages such as Python, Bash, or PowerShell

CI / CD experience with GitLab and automation across deployment pipelines

Solid understanding of SQL and NoSQL systems including Oracle DB and MongoDB

Familiarity with Kubernetes, container orchestration, and hybrid cloud (Azure, AWS, GCP, OCI)

Experience working in high-stakes, incident-driven environments

Strong working knowledge of Splunk, Grafana, Prometheus, and other observability tools

Understanding of AI / ML systems, inference APIs, and LLM infrastructure is a plus

Experience in platform compliance, security enforcement, and regulated domains (finance preferred)

Must Have Skills

Application & Microservice : Java, Spring boot, API & Service Design

Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI

App Platform : Docker & Containers (Kubernetes)

Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)

Any Messaging : Kafka, Rabbit MQ

Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)

Incident / Change / Problem Management

Nice To Have

Multi-region failover (SQL Server, MongoDB, vendors)

Observability platform design (sampling, retention policies)

Own domain SLOs and error budgets

Perf engineering for latency-sensitive apps

Toil automation (SRE bots, operators

Create a job alert for this search

Site Reliability Engineer • Hyderabad, India

Related jobs

Promoted

Sr Engineer, Site Reliability [T500-20425]

ANSRhyderabad, telangana, in

ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 7 days ago

Promoted

Sr Engineer, Site Reliability [T500-20279]

ANSRHyderabad, Telangana, India

ANSR is hiring for one of its clients.About T-Mobile : T-Mobile US, Inc.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its st...Show moreLast updated: 7 days ago

Promoted

Sr Engineer, Site Reliability [T500-20286]

ANSRhyderabad, telangana, in

NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 7 days ago

Promoted

Sr Engineer, Site Reliability Engineer [T500-20464]

ANSRhyderabad, telangana, in

Promoted

Sr Engineer, Site Reliability - Accounting Technology [T500-20168]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20504]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20515]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20521]

ANSRhyderabad, telangana, in

Promoted

Sr Engineer, Site Reliability [T500-20439]

ANSRHyderabad, Telangana, India

Promoted

Sr Engineer, Site Reliability [T500-20446]

ANSRhyderabad, telangana, in

Promoted

Sr Engineer, Site Reliability [T500-20444]

ANSRhyderabad, telangana, in

Site Reliability Engineer

Talent WorxHyderabad, TS, IN

Quick Apply

Site Reliability Engineer (SRE).At Talent Worx, we are looking for a dedicated Site Reliability Engineer (SRE) to join our team. This role involves maintaining high availability and reliability of o...Show moreLast updated: 30+ days ago

Promoted

Engineer, Site Reliability [T500-20518]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20519]

ANSRhyderabad, telangana, in

Promoted

Sr Engineer, Site Reliability [T500-20463]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20266]

ANSRhyderabad, telangana, in

Promoted

Engineer, Site Reliability [T500-20503]

ANSRHyderabad, Telangana, India

Promoted

Sr Engineer, Site Reliability - Performance [T500-20280]

ANSRhyderabad, telangana, in

Promoted

Site Reliability Engineer

Insight Globalhyderabad, telangana, in

Must be able to join within 30 days or less!.An employer is looking for an SRE to join their enterprise level SRE team.They are building a specialized team of Senior Site Reliability Engineers to a...Show moreLast updated: 30+ days ago

Promoted

Engineer, Site Reliability [T500-20517]

ANSRhyderabad, telangana, in