Talent.com
No longer accepting applications
Principal Engineer, Site Reliability

Principal Engineer, Site Reliability

TMUS Global SolutionsHyderabad, Republic Of India, IN
27 days ago
Job description

About T-Mobile :

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions :

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited operates as TMUS Global Solutions.

About the Role :

As a Senior Site Reliability Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will play a pivotal role in building and scaling intelligent infrastructure to support AI / ML applications, enterprise services, and LLM-based platforms. You will contribute to the design and implementation of observability frameworks, automation-first operations, and incident response strategies to ensure reliability, performance, and scalability across production systems.

What You’ll Do :

  • Implement and maintain observability, monitoring, and alerting systems for AI platforms and backend services
  • Design and support telemetry pipelines, logging infrastructure, and dashboards (Splunk, Prometheus, Grafana, Open Telemetry)
  • Define and monitor SLOs, SLIs, latency, availability, and throughput metrics
  • Participate in on-call rotations, incident resolution, root cause analysis, and postmortems
  • Improve CI / CD workflows and infrastructure automation using GitLab pipelines
  • Optimize and scale infrastructure including Kafka, RMQ, HAProxy, and distributed APIs
  • Collaborate with engineering teams on governance, compliance, and secure operations
  • Support capacity planning, cost analysis, and tuning for high-scale performance
  • Automate repetitive tasks and reduce toil via scripting (Python, Bash, Java)
  • Contribute to runbooks, knowledge base articles, and SRE best practice documentation
  • Mentor junior engineers and support a culture of operational excellence and reliability

What You’ll Bring :

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field
  • 4-7 years in SRE, DevOps, platform, or operations engineering roles
  • Strong hands-on experience in observability, monitoring, and distributed systems troubleshooting
  • Proficiency in scripting languages such as Python, Bash, or PowerShell
  • CI / CD experience with GitLab and automation across deployment pipelines
  • Solid understanding of SQL and NoSQL systems including Oracle DB and MongoDB
  • Familiarity with Kubernetes, container orchestration, and hybrid cloud (Azure, AWS, GCP, OCI)
  • Experience working in high-stakes, incident-driven environments
  • Strong working knowledge of Splunk, Grafana, Prometheus, and other observability tools
  • Understanding of AI / ML systems, inference APIs, and LLM infrastructure is a plus
  • Experience in platform compliance, security enforcement, and regulated domains (finance preferred)
  • Must Have Skills :

  • Application & Microservice : Java, Spring boot, API & Service Design
  • Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI
  • App Platform : Docker & Containers (Kubernetes)
  • Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)
  • Any Messaging : Kafka, Rabbit MQ
  • Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)
  • Incident / Change / Problem Management
  • Nice To Have :

  • Multi-region failover (SQL Server, MongoDB, vendors)
  • Observability platform design (sampling, retention policies)
  • Own domain SLOs and error budgets
  • Perf engineering for latency-sensitive apps
  • Toil automation (SRE bots, operators
  • Create a job alert for this search

    Site Reliability Engineer • Hyderabad, Republic Of India, IN

    Related jobs
    • Promoted
    Engineer, Site Reliability [T500-20266]

    Engineer, Site Reliability [T500-20266]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Engineer, Site Reliability [T500-20521]

    Engineer, Site Reliability [T500-20521]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Principal Engineer, Site Reliability - Accounting Technology T500-20232

    Principal Engineer, Site Reliability - Accounting Technology T500-20232

    ANSRHyderabad, Republic Of India, IN
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 14 days ago
    • Promoted
    Engineer, Site Reliability [T500-20502]

    Engineer, Site Reliability [T500-20502]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    NationsBenefits IndiaHyderabad, Telangana, India
    Site Reliability Engineer (SRE) | Fintech | Kubernetes | Datadog |.SRE team focused on maintaining the performance, reliability, and availability of our fintech platforms.Triage and resolve product...Show moreLast updated: 22 days ago
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    HireAlphahyderabad, India
    Job Description- Site Reliability Engineer.Ensure high availability, performance, and scalability of mission-critical systems and services. Lead the design and implementation of resilient and fault-...Show moreLast updated: 12 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Talent Sutrahyderabad, telangana, in
    The position exists to deploy the products and their updates ensuring smooth infrastructure and configuration management for robust project delivery. Operating System (Linux & Windows), Ansible, Doc...Show moreLast updated: 1 day ago
    • Promoted
    Principal Engineer, Site Reliability T500-20295

    Principal Engineer, Site Reliability T500-20295

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 11 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    CitNOW GroupHyderabad, IN
    Founded in 2008, CitNOW is an innovative, enterprise-level software product suite that allows automotive dealerships globally to sell more vehicles and parts more profitably.CitNOW’s app-based plat...Show moreLast updated: 10 hours ago
    • Promoted
    Principal Engineer, Site Reliability [T500-20295]

    Principal Engineer, Site Reliability [T500-20295]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago
    • Promoted
    Principal Systems Reliability Engineer

    Principal Systems Reliability Engineer

    TMUS Global SolutionsHyderabad, Republic Of India, IN
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 27 days ago
    • Promoted
    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    Principal Engineer, Site Reliability - Accounting Technology [T500-20232]

    ANSRHyderabad, Telangana, India
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability [T500-20518]

    Engineer, Site Reliability [T500-20518]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 26 days ago