Talent.com
Principal Engineer, Site Reliability [T500-20295]

Principal Engineer, Site Reliability [T500-20295]

TMUS Global SolutionsHyderabad, Telangana, India
17 days ago
Job description

About T-Mobile :

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions :

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited operates as TMUS Global Solutions.

About the Role :

As a Principal SRE, you will be a key member of the CFL Platform Engineering and Operations team ,you will lead reliability engineering for AI-powered platforms supporting LLM applications, AI gateways, and enterprise-scale services across finance, credit, collections, and document systems. You will design and implement observability and incident response frameworks, scale high-performance infrastructure, and champion SRE best practices to support secure, automated, and resilient systems.

What You’ll Do :

  • Architect observability and incident response pipelines for LLM, API, and backend systems
  • Define SLAs, SLIs, alerts, and dashboards for latency, throughput, and availability
  • Lead high-severity incident response, root cause analysis, and system recovery
  • Collaborate with AI, Platform, and Security teams to enforce operational guardrails
  • Implement automation-first strategies using GitLab CI / CD, Terraform, and deployment tooling
  • Guide infrastructure tuning, capacity planning, and cost optimization
  • Drive monitoring across hybrid clouds using Prometheus, Grafana, Splunk, Open Telemetry
  • Support AIOps, model observability, policy enforcement, and audit readiness
  • Mentor senior SREs and foster a high-ownership, technical excellence culture

What You’ll Bring :

  • Bachelor's or Master’s in Computer Science, Engineering, or related field
  • 7-12 years in SRE, infrastructure, or platform roles in distributed systems
  • Strong experience in incident management, AI / ML observability, and performance engineering
  • Hands-on expertise with OpenAI APIs, inference systems, AI gateways, and secure APIs
  • Proficiency in Python, Java, Bash / PowerShell, YAML
  • Deep knowledge of CI / CD workflows, GitLab pipelines, and SDLC processes
  • Experience with Kafka, HAProxy, RabbitMQ, Oracle DB, MongoDB
  • Proven success in scaling cloud-native platforms on Azure, AWS, GCP, or OCI
  • Familiarity with AIOps, latency scoring, policy validation, and secure AI operations
  • Background in compliance, governance, and enterprise risk management for AI systems
  • Advanced debugging skills across data, infrastructure, networking, and app layers
  • Leadership in chaos engineering, SLO-based operations, and system resilience
  • Must Have Skills :

  • Application & Microservice : Java, Spring boot, API & Service Design
  • Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI
  • App Platform : Docker & Containers (Kubernetes)
  • Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)
  • Any Messaging : Kafka, Rabbit MQ
  • Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)
  • Incident / Change / Problem Management
  • Nice To Have :

  • Compliance-aligned continuity planning (PCI, SOX)
  • Error-budget pacts with product / org leadership
  • Executive Incident / Change / Problem / risk reporting
  • Observability cost vs coverage trade-offs
  • Org-wide reliability governance strategy
  • Create a job alert for this search

    Site Reliability Engineer • Hyderabad, Telangana, India

    Related jobs
    • Promoted
    Engineer, Site Reliability [T500-20266]

    Engineer, Site Reliability [T500-20266]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.hyderabad, India
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 6 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Engineer, Site Reliability [T500-20521]

    Engineer, Site Reliability [T500-20521]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Sr Engineer, Site Reliability [T500-20279]

    Sr Engineer, Site Reliability [T500-20279]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 5 days ago
    • Promoted
    Engineer, Site Reliability [T500-20502]

    Engineer, Site Reliability [T500-20502]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    ConfidentialHyderabad / Secunderabad, Telangana
    Talent Management & Team Leadership : .Lead, mentor, empower and manage 5-10 hard-working engineering team to deliver exceptional results. System Reliability, Performance Optimization & Cost Reductio...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    NationsBenefits IndiaHyderabad, Telangana, India
    Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog |.SRE team focused on maintaining the performance, reliability, and availability of our fintech platforms.Triage and resolve product...Show moreLast updated: 13 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TalentiserHyderabad, Telangana, India
    Reliability, Automation, and Observability As a hybrid Site Reliability Engineer / DevOps Engineer, you'll be a key driver in ensuring the stability, performance, and scalability of our mission-criti...Show moreLast updated: 23 days ago
    • Promoted
    Engineer, Site Reliability [T500-20503]

    Engineer, Site Reliability [T500-20503]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 2 days ago
    • Promoted
    • New!
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Atyeti IncHyderabad, Telangana, India
    We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our growing team.Bachelor’s degree in computer science, Engineering, or equivalent practical experience.Site Re...Show moreLast updated: 6 hours ago
    • Promoted
    Principal Site Reliability Enginee

    Principal Site Reliability Enginee

    ConfidentialBengaluru / Bangalore, Hyderabad / Secunderabad, Telangana, Chennai
    As a Principal Site Reliability Engineer, you will be responsible for developing sophisticated systems and software based on the customer s business goals, needs and general business environment.Yo...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer, Site Reliability [T500-20295]

    Principal Engineer, Site Reliability [T500-20295]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability [T500-20519]

    Engineer, Site Reliability [T500-20519]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago
    • Promoted
    Engineer, Site Reliability [T500-20518]

    Engineer, Site Reliability [T500-20518]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 17 days ago