Talent.com
This job offer is not available in your country.
Sr Engineer, Software - AIOps [T500-20351]

Sr Engineer, Software - AIOps [T500-20351]

ANSRHyderabad, Telangana, India
9 days ago
Job description

About T-Mobile :

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions :

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited operates as TMUS Global Solutions.

About the Role :

As a Senior AIOps Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will help design and implement next-generation intelligent operations that support AI / ML platforms, LLM-based applications, and large-scale distributed systems. You’ll develop automation, observability, and remediation pipelines that enable predictive insights, reduce incident impact, and enhance the reliability of production environments.

This is a hands-on, technical role where you’ll work closely with SRE, DevOps, data, and platform teams to embed intelligent automation into core operational workflows.

What You’ll Do :

Develop automation pipelines for anomaly detection, root cause analysis, and self-healing

Build integrations between monitoring systems and AI / ML models for predictive alerting

Engineer real-time observability pipelines (logs, metrics, traces) across distributed platforms

Deploy and manage tools such as OpenTelemetry, Prometheus, Grafana, Splunk, and Datadog

Extend telemetry coverage for LLM-based systems, APIs, and hybrid cloud environments

Implement event-driven workflows for incident remediation and automated recovery

Contribute to intelligent alerting standards, dashboarding, and escalation logic

Collaborate with SRE and DevOps teams to define and implement reliability automation

Document playbooks, remediation flows, detection rules, and AIOps patterns

Partner with platform and data science teams on AIOps architecture and telemetry modeling

What You’ll Bring :

Bachelor's degree in Computer Science, Engineering, or a related field

4-7 years of experience in SRE, DevOps, automation, or infrastructure roles

Hands-on experience with observability tools : Prometheus, Grafana, Splunk, OpenTelemetry

Proficient in scripting languages such as Python, Go, or Bash

Experience building CI / CD pipelines and integrating infrastructure telemetry

Working knowledge of Kubernetes, container operations, and cloud-native architectures

Familiarity with Azure (preferred), AWS or GCP

Understanding of incident response workflows, system health checks, and auto-remediation

Must Have Skills :

Application & Microservice : Java, Spring boot, API & Service Design

Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI

App Platform : Docker & Containers (Kubernetes)

Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)

Any Messaging : Kafka, Rabbit MQ

Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)

AIOps Skills : GitOps / ArgoCD / Flux

Nice To Have :

Fleet mgmt across EKS / AKS, Databricks integration

Measure adoption (time-to-first-deploy)

Mentor / coach product teams

Multi-cloud identity federation (OIDC, SPIFFE)

Standardized compositions, lifecycle governance

Create a job alert for this search

Sr Software Engineer • Hyderabad, Telangana, India