Talent.com
This job offer is not available in your country.
Associate Manager SRE

Associate Manager SRE

ConfidentialHyderabad / Secunderabad, Telangana, India
9 days ago
Job description

Overview

We are seeking a self-driven, inquisitive, and curious Site Reliability Engineer (SRE) to drive reliability, availability, performance, and security across our global digital product ecosystem. This role is central to ensuring a seamless and resilient experience for our users by blending deep engineering expertise with operational excellence and automation.

You will be part of a global SRE practice supporting a portfolio of 260+ modern cloud-native applications across consumer, commercial, supply chain, and enablement functions. Your mission : prevent incidents before they occur, ensure rapid recovery when they do, and build scalable systems that evolve with our growing business.

Responsibilities

Champion reliability, observability, and operational excellence across mission-critical applications.

  • Develop and maintain service-level indicators (SLIs), objectives (SLOs), and error budgets to measure and improve system performance.
  • Implement automated monitoring, alerting, and recovery mechanisms to reduce manual intervention and improve response times.
  • Collaborate closely with software engineering, platform, and operations teams to embed SRE practices across the development lifecycle.
  • Lead and participate in incident response, root cause analysis, and postmortem reviews to drive long-term improvements.
  • Identify and eliminate sources of toil through automation, tooling, and process refinement.
  • Continuously improve resiliency design, capacity planning, and release management in production systems.
  • Influence engineering teams with best practices on cloud-native architecture, observability, and deployment strategies.

Qualifications

Required Skills :

  • 5+ years of experience in production engineering, DevOps, or SRE roles.
  • Strong foundation in Linux systems, networking, and cloud platforms (Azure, AWS, or GCP).
  • Hands-on experience with observability tools (e.g., AppDynamics, Prometheus, Grafana, ELK, FullStory).
  • Proficiency in scripting or programming (e.g., Python, Bash, Go) and automation frameworks (e.g., Ansible, Terraform).
  • Deep understanding of CI / CD pipelines, release strategies, and deployment automation.
  • Experience in managing high-scale, distributed systems in cloud-native environments.
  • Strong analytical skills and a passion for continuous improvement.
  • Preferred Skills :

  • Familiarity with microservices, Kubernetes, containers, and service mesh architecture.
  • Exposure to incident and problem management frameworks (e.g., ITIL, RCA practices).
  • Experience working in global teams supporting mission-critical applications.
  • Skills Required

    Networking

    Create a job alert for this search

    Sre • Hyderabad / Secunderabad, Telangana, India