Talent.com
Senior Incident Management Reliability Engineer

Senior Incident Management Reliability Engineer

ConfidentialHyderabad / Secunderabad, Telangana, India
6 days ago
Job description

About The Job

Our Team :

Service Quality cultivates a culture of service excellence where quality is more than a benchmark – it's a shared purpose. Through synergistic collaboration, advanced monitoring, and empathetic customer advocacy, we strive to elevate every interaction and transform challenges into opportunities for growth.

Main Responsibilities

The Incident Management Reliability Engineer is responsible for ensuring the stability, resilience, and reliability of critical IT services. This role combines strong incident management expertise with reliability engineering principles to minimize disruptions, drive rapid recovery from major incidents, and continuously improve system performance and availability.

  • Incident Management
  • Lead the end-to-end management of Major Incidents (P1 / P2), ensuring timely resolution and effective stakeholder communication.
  • Act as command centre lead during critical outages, coordinating across technical and business teams.
  • Ensure accurate and detailed incident documentation, including root cause, timeline and resolution steps.
  • Drive post-incident-reviews and ensure action items are implemented to prevent recurrence.
  • Maintain consistent communication and escalation processes aligned with ITSM best practices (e.g. ITIL)
  • Reliability Engineering
  • Collaborate with service owners and platform teams to enhance service reliability, observability, and fault tolerance.
  • Implement proactive monitoring, alerting, and automated recovery mechanisms.
  • Analyse incident trends and develop reliability improvement plans.
  • Participate in capacity planning, change reviews, and failure mode analysis to anticipate and mitigate risks.
  • Develop and track SLOs / SLIs / SLAs to measure service health and performance.
  • Continuous Improvement
  • Partner with problem management to identify recurring issues and lead root cause elimination initiatives.
  • Automate operational tasks and enhance service recovery using scripts, runbooks, and AIOps tools.
  • Contribute to the evolution of the Major Incident Process, ensuring best practices are embedded across the organization.
  • Key Performance Indicators
  • Mean Time to Resolve (MTTR) and Mean Time to Detect (MTTD).
  • Reduction in number and impact of recurring incidents.
  • Adherence to SLA / SLO targets.
  • Completion rate of post-incident actions.
  • Stakeholder satisfaction and transparency during incidents.

About You

  • Experience :
  • 15+ years' experience.
  • Preferred Certifications :
  • ITIL v4 or Service Operations certification.
  • SRE Foundation / Practitioner certification.
  • Cloud certifications (AWS, Azure, or GCP).
  • Incident Command System (ICS) or equivalent leadership training in crisis response.
  • Soft skills :
  • Communication (verbal and written).
  • Technical skills :
  • Virtualization
  • Cloud Technologies
  • Database
  • Networking
  • Containerization
  • Automation
  • Middleware / Scheduling
  • Infrastructure as code
  • Languages :
  • English
  • Pursue progress, discover extraordinary

    Better is out there. Better medications, better outcomes, better science. But progress doesn't happen without people – people from different backgrounds, in different locations, doing different roles, all united by one thing : a desire to make miracles happen. So, let's be those people.

    At Sanofi, we provide equal opportunities to all regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or gender identity.

    Watch our ALL IN video and check out our Diversity Equity and Inclusion actions at sanofi.com!

    null Pursue Progress . Discover Extraordinary .

    Join Sanofi and step into a new era of science - where your growth can be just as transformative as the work we do. We invest in you to reach further, think faster, and do what's never-been-done-before. You'll help push boundaries, challenge convention, and build smarter solutions that reach the communities we serve. Ready to chase the miracles of science and improve people's lives Let's Pursue Progress and Discover Extraordinary – together.

    At Sanofi, we provide equal opportunities to all regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, protected veteran status or other characteristics protected by law.

    Skills Required

    Reliability Engineering, Networking, Slas, Automation, Cloud Technologies, containerization , proactive monitoring , Incident Management, Virtualization, Database

    Create a job alert for this search

    Senior Reliability Engineer • Hyderabad / Secunderabad, Telangana, India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AutoRABIThyderabad, telangana, in
    AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 30+ days ago
    Major Incident Response Technical Lead

    Major Incident Response Technical Lead

    CapgeminiHyderabad, TS, IN
    Quick Apply
    Our Client is one of the United States’ largest insurers, providing a wide range of insurance and financial services products with gross written premium well over US$25 Billion (P&C).They proud...Show moreLast updated: 16 days ago
    • Promoted
    Major Incident Management

    Major Incident Management

    ConfidentialHyderabad / Secunderabad, Telangana
    Tole- Major Incident Management.Accountable for the efficient and effective execution of the Major Incident Management process. Coordinate MIM calls towards resolution by taking end-to-end ownership...Show moreLast updated: 19 days ago
    • Promoted
    Engineer, Site Reliability [T500-20521]

    Engineer, Site Reliability [T500-20521]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 19 days ago
    • Promoted
    Vulnerability Management Engineer

    Vulnerability Management Engineer

    ConfidentialHyderabad / Secunderabad, Telangana, India
    Vulnerability Assessment & Management.Manage vulnerability programs for IT assets, containers (e.Docker, Kubernetes), and base golden images across operating systems (Windows, Linux, Unix).Conduct ...Show moreLast updated: 10 days ago
    Major Incident Response Technical Specialist

    Major Incident Response Technical Specialist

    CapgeminiHyderabad, TS, IN
    Quick Apply
    Our Client is one of the United States’ largest insurers, providing a wide range of insurance and financial services products with gross written premium well over US$25 Billion (P&C).They proud...Show moreLast updated: 16 days ago
    • Promoted
    Engineer, Site Reliability [T500-20515]

    Engineer, Site Reliability [T500-20515]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    Engineer, Site Reliability [T500-20517]

    Engineer, Site Reliability [T500-20517]

    TMUS Global SolutionsHyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Incident Management Reliability Engineer

    Incident Management Reliability Engineer

    ConfidentialHyderabad / Secunderabad, Telangana, India
    Service Quality cultivates a culture of service excellence where quality is more than a benchmark – it's a shared purpose. Through synergistic collaboration, advanced monitoring, and empathetic cust...Show moreLast updated: 15 hours ago
    • Promoted
    Cloud Solutions Architect & Incident Management Specialist

    Cloud Solutions Architect & Incident Management Specialist

    TEKsystems Global Services in IndiaHyderabad, Republic Of India, IN
    AWS infrastructure operations, with at least.AWS Certified Solutions Architect – Professional or equivalent.Excellent analytical, problem-solving, and decision-making skills.Managed Service Provide...Show moreLast updated: 6 days ago
    • Promoted
    • New!
    Incident Management Specialist

    Incident Management Specialist

    True Tech ProfessionalsSecunderabad, Republic Of India, IN
    Job Description : Incident, Problem, Change and Release Management Team.Location : Chennai / Pune / Hyderabad.We are seeking highly motivated professionals to join our Incident, Problem, Change, and Rele...Show moreLast updated: 1 hour ago
    • Promoted
    Incident Management Specialist

    Incident Management Specialist

    ConfidentialHyderabad / Secunderabad, Telangana
    Act as the central point of contact during.Communicate incident status and resolution updates to stakeholders, including. RCA), and follow-up on corrective actions.Identify recurring issues and work...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 16 days ago
    • Promoted
    Incident Management

    Incident Management

    ConfidentialHyderabad / Secunderabad, Telangana
    Teamware Solutions is seeking a proficient.Incident Management Specialist.You'll be crucial in minimizing the impact of IT service disruptions, overseeing the entire incident lifecycle from detecti...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Nebula Tech Solutionshyderabad, telangana, in
    SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 6 days ago
    • Promoted
    Major Incident Response Analyst

    Major Incident Response Analyst

    ConfidentialHyderabad / Secunderabad, Telangana, India
    Our Client is one of the United States' largest insurers, providing a wide range of insurance and financial services products with gross written premium well over US$25 Billion (P&C).They proudly s...Show moreLast updated: 8 days ago
    • Promoted
    Major Incident Manager (Escalation Management Team)

    Major Incident Manager (Escalation Management Team)

    Genpacthyderabad, telangana, in
    Major Incident Manager (Escalation Management Team).Kindly share resume to nsenthil.Sub of "MIM" along with notice period. We are seeking a proactive and skilled Major Incident Manager to join our E...Show moreLast updated: 28 days ago