Talent.com
This job offer is not available in your country.
Sr Engineer, Software - AIOps [T500-20351]

Sr Engineer, Software - AIOps [T500-20351]

ANSRhyderabad, telangana, in
3 days ago
Job description

About T-Mobile :

T-Mobile US, Inc. (NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mobile. Customers benefit from an unmatched combination of value, quality, and exceptional service experience.

TMUS Global Solutions :

TMUS Global Solutions is a world-class technology powerhouse accelerating the company’s global digital transformation. With a culture built on growth, inclusivity, and global collaboration, the teams here drive innovation at scale, powered by bold thinking.

TMUS India Private Limited operates as TMUS Global Solutions.

About the Role :

As a Senior AIOps Engineer, you will be a key member of the CFL Platform Engineering and Operations team you will help design and implement next-generation intelligent operations that support AI / ML platforms, LLM-based applications, and large-scale distributed systems. You’ll develop automation, observability, and remediation pipelines that enable predictive insights, reduce incident impact, and enhance the reliability of production environments.

This is a hands-on, technical role where you’ll work closely with SRE, DevOps, data, and platform teams to embed intelligent automation into core operational workflows.

What You’ll Do :

  • Develop automation pipelines for anomaly detection, root cause analysis, and self-healing
  • Build integrations between monitoring systems and AI / ML models for predictive alerting
  • Engineer real-time observability pipelines (logs, metrics, traces) across distributed platforms
  • Deploy and manage tools such as OpenTelemetry, Prometheus, Grafana, Splunk, and Datadog
  • Extend telemetry coverage for LLM-based systems, APIs, and hybrid cloud environments
  • Implement event-driven workflows for incident remediation and automated recovery
  • Contribute to intelligent alerting standards, dashboarding, and escalation logic
  • Collaborate with SRE and DevOps teams to define and implement reliability automation
  • Document playbooks, remediation flows, detection rules, and AIOps patterns
  • Partner with platform and data science teams on AIOps architecture and telemetry modeling

What You’ll Bring :

  • Bachelor's degree in Computer Science, Engineering, or a related field
  • 4-7 years of experience in SRE, DevOps, automation, or infrastructure roles
  • Hands-on experience with observability tools : Prometheus, Grafana, Splunk, OpenTelemetry
  • Proficient in scripting languages such as Python, Go, or Bash
  • Experience building CI / CD pipelines and integrating infrastructure telemetry
  • Working knowledge of Kubernetes, container operations, and cloud-native architectures
  • Familiarity with Azure (preferred), AWS or GCP
  • Understanding of incident response workflows, system health checks, and auto-remediation
  • Must Have Skills :

  • Application & Microservice : Java, Spring boot, API & Service Design
  • Any CI / CD Tools : Gitlab Pipeline / Test Automation / GitHub Actions / Jenkins / Circle CI
  • App Platform : Docker & Containers (Kubernetes)
  • Any Databases : SQL & NOSQL (Cassandra / Oracle / Snowflake / MongoDB)
  • Any Messaging : Kafka, Rabbit MQ
  • Any Observability / Monitoring : Splunk / Grafana / Open Telemetry / ELK Stack / Datadog / New Relic / Prometheus)
  • AIOps Skills : GitOps / ArgoCD / Flux
  • Nice To Have :

  • Fleet mgmt across EKS / AKS, Databricks integration
  • Measure adoption (time-to-first-deploy)
  • Mentor / coach product teams
  • Multi-cloud identity federation (OIDC, SPIFFE)
  • Standardized compositions, lifecycle governance
  • Create a job alert for this search

    Sr Software Engineer • hyderabad, telangana, in