Talent.com
This job offer is not available in your country.
▷ Only 24h Left : Principal Engineer, Site Reliability [T500-20295]

▷ Only 24h Left : Principal Engineer, Site Reliability [T500-20295]

TMUS Global SolutionsIndia
9 hours ago
Job description

About T-Mobile :

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions :

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited operates as TMUS Global Solutions.

About the Role :

As a Principal SRE, you will be a key member of the CFL Platform Engineering and Operations team ,you will lead reliability engineering for AI-powered platforms supporting LLM applications, AI gateways, and enterprise-scale services across finance, credit, collections, and document systems. You will design and implement observability and incident response frameworks, scale high-performance infrastructure, and champion SRE best practices to support secure, automated, and resilient systems.

What You’ll Do :

  • Architect observability and incident response pipelines for LLM, API, and backend systems
  • Define SLAs, SLIs, alerts, and dashboards for latency, throughput, and availability
  • Lead high-severity incident response, root cause analysis, and system recovery
  • Collaborate with AI, Platform, and Security teams to enforce operational guardrails
  • Implement automation-first strategies using GitLab CI / CD, Terraform, and deployment tooling
  • Guide infrastructure tuning, capacity planning, and cost optimization
  • Drive monitoring across hybrid clouds using Prometheus, Grafana, Splunk, Open Telemetry
  • Support AIOps, model observability, policy enforcement, and audit readiness
  • Mentor senior SREs and foster a high-ownership, technical excellence culture

What You’ll Bring :

  • Bachelor's or Master’s in Computer Science, Engineering, or related field
  • 7-12 years in SRE, infrastructure, or platform roles in distributed systems
  • Strong experience in incident management, AI / ML observability, and performance engineering
  • Hands-on expertise with OpenAI APIs, inference systems, AI gateways, and secure APIs
  • Proficiency in Python, Java, Bash / PowerShell, YAML
  • Deep knowledge of CI / CD workflows, GitLab pipelines, and SDLC processes
  • Experience with Kafka, HAProxy, RabbitMQ, Oracle DB, MongoDB
  • Proven success in scaling cloud-native platforms on Azure, AWS, GCP, or OCI
  • Familiarity with AIOps, latency scoring, policy validation, and secure AI operations
  • Background in compliance, governance, and enterprise risk management for AI systems
  • Advanced debugging skills across data, infrastructure, networking, and app layers
  • Leadership in chaos engineering, SLO-based operations, and system resilience
  • Must Have Skills :

  • Application & Microservice : Java, Spring boot, API & Service Design
  • Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI
  • App Platform : Docker & Containers (Kubernetes)
  • Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)
  • Any Messaging : Kafka, Rabbit MQ
  • Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)
  • Incident / Change / Problem Management
  • Nice To Have :

  • Compliance-aligned continuity planning (PCI, SOX)
  • Error-budget pacts with product / org leadership
  • Executive Incident / Change / Problem / risk reporting
  • Observability cost vs coverage trade-offs
  • Org-wide reliability governance strategy
  • Create a job alert for this search

    Only 24H Left Engineer • India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Nagpur, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2nagpur, maharashtra, in
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 25 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Sonata SoftwareNagpur, IN
    We're Hiring : Senior Site Reliability Engineer.Onsite (Office : Hyderabad – Mandatory from Day 1).Senior Site Reliability Engineer (SRE). This is a high-impact role where you’ll design scalable archi...Show moreLast updated: 3 days ago
    • Promoted
    Principal Engineer

    Principal Engineer

    Hotel Tradernagpur, maharashtra, in
    We're Hiring : Staff / Principal Engineer (Java) - Remote.Location : Remote | 🌍 Global Team | 💼 Experience : 8–12 years. Ready to build the future of hotel distribution at scale?.At Hotel Trader, we're...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Rakuten IndiaIndia
    Design, develop SLA, SLO, SLI of services within the Business Unit.Involve in whole process of Development, Production System Operation including system maintenance, monitoring, automation, backend...Show moreLast updated: 30+ days ago
    • Promoted
    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    ANSRIndia
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    JRD SystemsIndia
    Site Reliability Engineer (SRE) Role Overview : We are seeking an experienced Site Reliability Engineer (SRE) with a strong background in. The ideal candidate will partner with development teams to i...Show moreLast updated: 9 hours ago
    • Promoted
    Site Reliability Engineer / Lead

    Site Reliability Engineer / Lead

    CoforgeIndia
    Role : SRE Lead Engineer Skills : Docker, Prometheus, grafana, ELK, DataDog Location : Noida Experience : 8+ Years Mode : Work from office. We at Coforge are hiring a highly skilled and experienced.You w...Show moreLast updated: 13 days ago
    • Promoted
    Software Engineer, Site Reliability Engineering (Ecoh Core)

    Software Engineer, Site Reliability Engineering (Ecoh Core)

    EcohNagpur, IN
    Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.Strong problem-solving and analytical skills. Ability to debug, optimize code, and automate routine tasks.E...Show moreLast updated: 3 days ago
    • Promoted
    Principal Engineer, Site Reliability [T500-20295]

    Principal Engineer, Site Reliability [T500-20295]

    TMUS Global SolutionsIndia
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 7 days ago
    • Promoted
    ▷ High Salary! Principal Site Reliability Engineer

    ▷ High Salary! Principal Site Reliability Engineer

    Rakuten IndiaIndia
    Design, develop SLA, SLO, SLI of services within the Business Unit.Involve in whole process of Development, Production System Operation including system maintenance, monitoring, automation, backend...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.nagpur, maharashtra, in
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 3 days ago
    • Promoted
    Principal Technical Engineer(Configurations)

    Principal Technical Engineer(Configurations)

    Qinecsa SolutionsNagpur, IN
    We are seeking a Principal Technical Engineer to develop and deploy client configurations for our flagship Qinecsa Vigilance Workbench signal detection platform. The ideal candidate will be dynamic ...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmanagpur, maharashtra, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 2 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Endpoint ClinicalIndia
    Endpoint is an interactive response technology (IRT®) systems and solutions provider that supports the life sciences industry. Since 2009, we have been working with a single vision in mind, to help ...Show moreLast updated: 3 days ago
    • Promoted
    [3 Days Left] Site Reliability Engineer

    [3 Days Left] Site Reliability Engineer

    SynechronIndia
    We have immediate opportunity for Senior Site Reliability Engineer.Job Role : Senior Site Reliability Engineer.Job Location : Synechron ( Bengaluru). At Synechron, we believe in the power of digital t...Show moreLast updated: 10 days ago
    Site Reliability Engineer- Platform Engineering

    Site Reliability Engineer- Platform Engineering

    Weekday AIIN
    Remote
    Quick Apply
    This role is for one of Weekday’s clients.We are looking for an experienced and motivated.Site Reliability Engineer (SRE) – Platform Engineering. In this role, you will be responsible for designing,...Show moreLast updated: 19 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    SynechronIndia
    Good-day, We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer Job Location : Synechron. Notice : Immediate Joiner About Company : At Synechron, we belie...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub Servicesnagpur, maharashtra, in
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 24 days ago
    • Promoted
    • New!
    ▷ (15h Left) Senior Site Reliability Engineer

    ▷ (15h Left) Senior Site Reliability Engineer

    Tata Consultancy ServicesIndia
    TCS is looking for Senior Site Reliability Engineer – AWS.Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS - Develop and improve CI / CD pipelines, Infrast...Show moreLast updated: 1 hour ago