Talent.com
No longer accepting applications
Principal Engineer, Site Reliability [T500-20295]

Principal Engineer, Site Reliability [T500-20295]

TMUS Global SolutionsIndia
22 days ago
Job description

About T-Mobile :

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions :

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited operates as TMUS Global Solutions.

About the Role :

As a Principal SRE, you will be a key member of the CFL Platform Engineering and Operations team ,you will lead reliability engineering for AI-powered platforms supporting LLM applications, AI gateways, and enterprise-scale services across finance, credit, collections, and document systems. You will design and implement observability and incident response frameworks, scale high-performance infrastructure, and champion SRE best practices to support secure, automated, and resilient systems.

What You’ll Do :

  • Architect observability and incident response pipelines for LLM, API, and backend systems
  • Define SLAs, SLIs, alerts, and dashboards for latency, throughput, and availability
  • Lead high-severity incident response, root cause analysis, and system recovery
  • Collaborate with AI, Platform, and Security teams to enforce operational guardrails
  • Implement automation-first strategies using GitLab CI / CD, Terraform, and deployment tooling
  • Guide infrastructure tuning, capacity planning, and cost optimization
  • Drive monitoring across hybrid clouds using Prometheus, Grafana, Splunk, Open Telemetry
  • Support AIOps, model observability, policy enforcement, and audit readiness
  • Mentor senior SREs and foster a high-ownership, technical excellence culture

What You’ll Bring :

  • Bachelor's or Master’s in Computer Science, Engineering, or related field
  • 7-12 years in SRE, infrastructure, or platform roles in distributed systems
  • Strong experience in incident management, AI / ML observability, and performance engineering
  • Hands-on expertise with OpenAI APIs, inference systems, AI gateways, and secure APIs
  • Proficiency in Python, Java, Bash / PowerShell, YAML
  • Deep knowledge of CI / CD workflows, GitLab pipelines, and SDLC processes
  • Experience with Kafka, HAProxy, RabbitMQ, Oracle DB, MongoDB
  • Proven success in scaling cloud-native platforms on Azure, AWS, GCP, or OCI
  • Familiarity with AIOps, latency scoring, policy validation, and secure AI operations
  • Background in compliance, governance, and enterprise risk management for AI systems
  • Advanced debugging skills across data, infrastructure, networking, and app layers
  • Leadership in chaos engineering, SLO-based operations, and system resilience
  • Must Have Skills :

  • Application & Microservice : Java, Spring boot, API & Service Design
  • Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI
  • App Platform : Docker & Containers (Kubernetes)
  • Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)
  • Any Messaging : Kafka, Rabbit MQ
  • Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)
  • Incident / Change / Problem Management
  • Nice To Have :

  • Compliance-aligned continuity planning (PCI, SOX)
  • Error-budget pacts with product / org leadership
  • Executive Incident / Change / Problem / risk reporting
  • Observability cost vs coverage trade-offs
  • Org-wide reliability governance strategy
  • Create a job alert for this search

    Site Reliability Engineer • India

    Related jobs
    • Promoted
    Principal Software Engineer

    Principal Software Engineer

    ArcserveNagpur, IN
    Established in 1983, Arcserve is the world’s most experienced provider of business continuity solutions that safeguard every application and system, on every premises and every cloud.Organizations ...Show moreLast updated: 19 days ago
    • Promoted
    Lead - Cloud Reliability Engineer

    Lead - Cloud Reliability Engineer

    Searce Incnagpur, India
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 9 days ago
    • Promoted
    Technical Lead

    Technical Lead

    ThumoNagpur, IN
    Founding Engineer @ Thumo (Africa’s first super-app).We’re building Africa’s super-app, starting with food delivery.M funding round led by Soma Capital with top Silicon Valley angels, we’re hiring ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeNagpur, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 10 days ago
    • Promoted
    Resident Engineer – Kubernetes & Portworx

    Resident Engineer – Kubernetes & Portworx

    CMK Resources, Inc.Nagpur, IN
    CMK Resources Resident Engineer – Kubernetes & Portworx (3 openings).Help Shape the Future of Kubernetes Storage.Our client's largest and most strategic customer is moving VMware-based workloads to...Show moreLast updated: 30+ days ago
    • Promoted
    DevOps / Platform Engineer

    DevOps / Platform Engineer

    iVedha Inc.Nagpur, IN
    Hiring a seasoned DevOps / Platform Engineer to drive automation, platform reliability, and robust.Design, deploy, and manage CI / CD pipelines and infrastructure automation, leveraging AI for.Implemen...Show moreLast updated: 30+ days ago
    • Promoted
    eBPF Systems Engineer (Core Agent Team)

    eBPF Systems Engineer (Core Agent Team)

    Alma SecurityNagpur, IN
    The ideal candidate will help build, maintain, and troubleshoot, the company's rapidly expanding infrastructure.They will work alongside other engineers to ensure highest levels of performance and ...Show moreLast updated: 30+ days ago
    • Promoted
    Rotating Equipment Reliability Consultant / Trainer

    Rotating Equipment Reliability Consultant / Trainer

    EC-Energy EventsNagpur, IN
    EC-Energy Events is looking for an experienced Rotating Equipment Reliability Consultant / Trainer to join our growing pool of experts supporting technical conferences, training programs, and worksho...Show moreLast updated: 29 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiIndia, India
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 7 days ago
    • Promoted
    Principal Engineer

    Principal Engineer

    FV Banknagpur, India
    FV Bank is a fully licensed and regulated U.With a focus on innovation, security, and compliance, FV Bank is Banking the Future by providing USD banking, digital asset custody services, money marke...Show moreLast updated: 8 days ago
    • Promoted
    Principal Technical Engineer(Configurations)

    Principal Technical Engineer(Configurations)

    Qinecsa SolutionsNagpur, IN
    We are seeking a Principal Technical Engineer to develop and deploy client configurations for our flagship Qinecsa Vigilance Workbench signal detection platform. The ideal candidate will be dynamic ...Show moreLast updated: 18 days ago
    • Promoted
    Principal Engineer

    Principal Engineer

    Hotel TraderNagpur, IN
    We're Hiring : Staff / Principal Engineer (Java) - Remote.Location : Remote | 🌍 Global Team | 💼 Experience : 8–12 years. Ready to build the future of hotel distribution at scale?.At Hotel Trader, we're...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - CI / CD Pipeline

    Site Reliability Engineer - CI / CD Pipeline

    Hashone CareersIndia
    We are looking for a skilled Site Reliability Engineer (SRE) with a strong DevOps background and deep expertise in Google Cloud Platform (GCP). The ideal candidate will be responsible for ensuring t...Show moreLast updated: 30+ days ago
    • Promoted
    Head of Engineering

    Head of Engineering

    RecruinNagpur, IN
    As Head of Engineering, you will be the driving force behind the technology roadmap, team.You’ll lead protocol design, smart contract. DevSecOps, and architecting high-performance, secure.You will d...Show moreLast updated: 19 days ago
    • Promoted
    Deployment Engineer

    Deployment Engineer

    AvocaNagpur, IN
    Build, launch & optimize AI agents that power the next generation of home-service customer experiences.Avoca is the all-in-one AI lead-conversion platform. Our technology boosts booking rates, slash...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.India, India
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior site reliability engineer- elk expert

    Senior site reliability engineer- elk expert

    IVedha Inc.Nagpur, Maharashtra, India
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering PracticeLocation : India (Remote) - Must be available to work in the EST (US / Canada) Time Zone. Role Summary : Are you a Senio...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer - IAC Terraform

    Site Reliability Engineer - IAC Terraform

    Hashone CareersIndia
    We are seeking a dedicated Reliability Engineer to ensure the optimal performance, availability, and reliability of our systems and infrastructure. In this role, you will focus on identifying potent...Show moreLast updated: 19 days ago