Talent.com
This job offer is not available in your country.
Sr Engineer, Site Reliability [T500-20286]

Sr Engineer, Site Reliability [T500-20286]

ANSRIndia
9 days ago
Job description

About T-Mobile

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited is a subsidiary of T-Mobile US, Inc. and operates as TMUS Global Solutions.

About the Role

As a Senior Site Reliability Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will play a pivotal role in building and scaling intelligent infrastructure to support AI / ML applications, enterprise services, and LLM-based platforms. You will contribute to the design and implementation of observability frameworks, automation-first operations, and incident response strategies to ensure reliability, performance, and scalability across production systems.

What You’ll Do

Implement and maintain observability, monitoring, and alerting systems for AI platforms and backend services

Design and support telemetry pipelines, logging infrastructure, and dashboards (Splunk, Prometheus, Grafana, Open Telemetry)

Define and monitor SLOs, SLIs, latency, availability, and throughput metrics

Participate in on-call rotations, incident resolution, root cause analysis, and postmortems

Improve CI / CD workflows and infrastructure automation using GitLab pipelines

Optimize and scale infrastructure including Kafka, RMQ, HAProxy, and distributed APIs

Collaborate with engineering teams on governance, compliance, and secure operations

Support capacity planning, cost analysis, and tuning for high-scale performance

Automate repetitive tasks and reduce toil via scripting (Python, Bash, Java)

Contribute to runbooks, knowledge base articles, and SRE best practice documentation

Mentor junior engineers and support a culture of operational excellence and reliability

What You’ll Bring

Bachelor’s degree in Computer Science, Engineering, or a related technical field

4-7 years in SRE, DevOps, platform, or operations engineering roles

Strong hands-on experience in observability, monitoring, and distributed systems troubleshooting

Proficiency in scripting languages such as Python, Bash, or PowerShell

CI / CD experience with GitLab and automation across deployment pipelines

Solid understanding of SQL and NoSQL systems including Oracle DB and MongoDB

Familiarity with Kubernetes, container orchestration, and hybrid cloud (Azure, AWS, GCP, OCI)

Experience working in high-stakes, incident-driven environments

Strong working knowledge of Splunk, Grafana, Prometheus, and other observability tools

Understanding of AI / ML systems, inference APIs, and LLM infrastructure is a plus

Experience in platform compliance, security enforcement, and regulated domains (finance preferred)

Must Have Skills

Application & Microservice : Java, Spring boot, API & Service Design

Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI

App Platform : Docker & Containers (Kubernetes)

Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)

Any Messaging : Kafka, Rabbit MQ

Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)

Incident / Change / Problem Management

Nice To Have

Multi-region failover (SQL Server, MongoDB, vendors)

Observability platform design (sampling, retention policies)

Own domain SLOs and error budgets

Perf engineering for latency-sensitive apps

Toil automation (SRE bots, operators

Create a job alert for this search

Site Reliability Engineer • India

Related jobs
  • Promoted
Site Reliability Engineer - Chaos Management

Site Reliability Engineer - Chaos Management

Xebianagpur, maharashtra, in
AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 9 days ago
  • Promoted
Lead Sustenance Engineer - Storage

Lead Sustenance Engineer - Storage

DDNNagpur, IN
This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a globa...Show moreLast updated: 9 days ago
  • Promoted
L3 O365 Engineer

L3 O365 Engineer

Nextbridge IT SolutionsNagpur, IN
We are seeking a highly skilled .This senior role is a critical escalation point for complex issues, driving the resolution of major incidents and ensuring the seamless operation, security, and pro...Show moreLast updated: 9 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

UplersNagpur, IN
Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 26 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

ExasoftNagpur, IN
Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 2 days ago
  • Promoted
Senior MLOps Engineer

Senior MLOps Engineer

Mitchell Martin Inc.Nagpur, IN
Include, but are not limited to, the following : .Own productionizing models—from tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollba...Show moreLast updated: 22 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

XebiaNagpur, IN
AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 28 days ago
  • Promoted
L4 UC Engineer

L4 UC Engineer

Servion Global SolutionsNagpur, IN
UC Architecture & Design : Deep understanding of Unified Communications Products like CUCM, CUC, IM & Presence, and Expressways. Deep knowledge of designing and troubleshooting clusters, inter-cluste...Show moreLast updated: 20 days ago
  • Promoted
Deployment Engineer

Deployment Engineer

AvocaNagpur, IN
Build, launch & optimize AI agents that power the next generation of home-service customer experiences.Avoca is the all-in-one AI lead-conversion platform. Our technology boosts booking rates, slash...Show moreLast updated: 30+ days ago
  • Promoted
Sr. AWS Cloud Engineer

Sr. AWS Cloud Engineer

MastekNagpur, IN
Cloud Engineer Job description : .Have work experience in the following areas : .Experience in designing, building, and maintaining AWS Cloud Infrastructure. Proficient in AWS services including EC2, S3...Show moreLast updated: 26 days ago
  • Promoted
Senior Site Reliability Engineer- ELK Expert

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.India, India
Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
  • Promoted
Reliability Engineer and Planning Engineer

Reliability Engineer and Planning Engineer

JobTravia Pvt. Ltd.Nagpur, IN
Reliability / Planning Superintendent.Lead reliability and maintenance planning across the processing plant to ensure safe, efficient, and cost-effective operations. Drive continuous improvement, asse...Show moreLast updated: 2 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

BayOne Solutionsnagpur, maharashtra, in
Role : Site Reliability Engineer.The CXE Site Reliability Engineering (SRE) team manages the CI / CD pipelines and cloud infrastructure, ensuring seamless deployment, monitoring, and maintenance.Howev...Show moreLast updated: 2 days ago
  • Promoted
D&E Engineer

D&E Engineer

Eki.StructNagpur, IN
The Company’s Equal Opportunities policy applies equally to the recruitment process and must be complied with at every stage of the recruitment process. This means that prospective applicants should...Show moreLast updated: 2 days ago
  • Promoted
Senior IAM Engineer

Senior IAM Engineer

ATCNagpur, IN
IAM Senior Engineer (CIAM & PAM – CyberArk).The IAM Senior Engineer will be responsible for the design, build, deployment, and support of Customer Identity & Access Management (CIAM) and Privileged...Show moreLast updated: 2 days ago
  • Promoted
Senior Release Engineer (Branching, Merging & Deployment)

Senior Release Engineer (Branching, Merging & Deployment)

CESNagpur, IN
This role is critical in maintaining the.Perform code merges, resolve conflicts, and ensure.Automate deployment processes for. Collaborate with teams to align on.Monitor deployments, troubleshoot is...Show moreLast updated: 7 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

ConcordIndia, India
Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 20 days ago
  • Promoted
Site Reliability Engineer / Architect - CI / CD Pipeline

Site Reliability Engineer / Architect - CI / CD Pipeline

Cling Multi SolutionsIndia
Job Summary : We are seeking an experienced Site Reliability Engineer (SRE) Architect with over 10 years of IT experience, specializing in designing and implementing ...Show moreLast updated: 2 days ago
  • Promoted
Sr. Fullstack engineer

Sr. Fullstack engineer

Relevance LabNagpur, IN
Design, develop, and maintain our AI Agents platform using Python, FastAPI, React, and TypeScript.Collaborate with cross-functional teams, including product managers, designers, and backend enginee...Show moreLast updated: 30+ days ago
  • Promoted
Sr. CyberArk Engineer

Sr. CyberArk Engineer

CyberSolveNagpur, IN
CyberSolve is a fastest growing IAM Specialist firm in the US with aspirations of becoming the world's largest company in the IAM space. CyberSolve’s 350+ specialists solve interesting puzzles in IG...Show moreLast updated: 18 days ago