ANSR is hiring for one of its clients.
About T-Mobile :
T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.
TMUS Global Solutions :
TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.
TMUS India Private Limited is a subsidiary of T-Mobile US, Inc. and operates as TMUS Global Solutions.
About the Role :
As a Senior AIOps Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will help design and implement next-generation intelligent operations that support AI / ML platforms, LLM-based applications, and large-scale distributed systems. You’ll develop automation, observability, and remediation pipelines that enable predictive insights, reduce incident impact, and enhance the reliability of production environments.
This is a hands-on, technical role where you’ll work closely with SRE, DevOps, data, and platform teams to embed intelligent automation into core operational workflows.
What You’ll Do :
Develop automation pipelines for anomaly detection, root cause analysis, and self-healing
Build integrations between monitoring systems and AI / ML models for predictive alerting
Engineer real-time observability pipelines (logs, metrics, traces) across distributed platforms
Deploy and manage tools such as OpenTelemetry, Prometheus, Grafana, Splunk, and Datadog
Extend telemetry coverage for LLM-based systems, APIs, and hybrid cloud environments
Implement event-driven workflows for incident remediation and automated recovery
Contribute to intelligent alerting standards, dashboarding, and escalation logic
Collaborate with SRE and DevOps teams to define and implement reliability automation
Document playbooks, remediation flows, detection rules, and AIOps patterns
Partner with platform and data science teams on AIOps architecture and telemetry modeling
What You’ll Bring :
Bachelor's degree in Computer Science, Engineering, or a related field
4-7 years of experience in SRE, DevOps, automation, or infrastructure roles
Hands-on experience with observability tools : Prometheus, Grafana, Splunk, OpenTelemetry
Proficient in scripting languages such as Python, Go, or Bash
Experience building CI / CD pipelines and integrating infrastructure telemetry
Working knowledge of Kubernetes, container operations, and cloud-native architectures
Familiarity with Azure (preferred), AWS or GCP
Understanding of incident response workflows, system health checks, and auto-remediation
Must Have Skills :
Application & Microservice : Java, Spring boot, API & Service Design
Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI
App Platform : Docker & Containers (Kubernetes)
Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)
Any Messaging : Kafka, Rabbit MQ
Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)
AIOps Skills : GitOps / ArgoCD / Flux
Nice To Have :
Fleet mgmt across EKS / AKS, Databricks integration
Measure adoption (time-to-first-deploy)
Mentor / coach product teams
Multi-cloud identity federation (OIDC, SPIFFE)
Standardized compositions, lifecycle governance
Sr Software Engineer • Hyderabad, India