Talent.com
Incident Management Reliability Engineer
Incident Management Reliability EngineerSanofi • Hyderabad / Secunderabad, Telangana, India
Incident Management Reliability Engineer

Incident Management Reliability Engineer

Sanofi • Hyderabad / Secunderabad, Telangana, India
30+ days ago
Job description
About The Job

Our Team:

Service Quality cultivates a culture of service excellence where quality is more than a benchmark – it's a shared purpose. Through synergistic collaboration, advanced monitoring, and empathetic customer advocacy, we strive to elevate every interaction and transform challenges into opportunities for growth.

Main Responsibilities

The Incident Management Reliability Engineer is responsible for ensuring the stability, resilience, and reliability of critical IT services. This role combines strong incident management expertise with reliability engineering principles to minimize disruptions, drive rapid recovery from major incidents, and continuously improve system performance and availability.

  • Incident Management
  • Lead the end-to-end management of Major Incidents (P1/P2), ensuring timely resolution and effective stakeholder communication.
  • Act as command centre lead during critical outages, coordinating across technical and business teams.
  • Ensure accurate and detailed incident documentation, including root cause, timeline and resolution steps.
  • Drive post-incident-reviews and ensure action items are implemented to prevent recurrence.
  • Maintain consistent communication and escalation processes aligned with ITSM best practices (e.g. ITIL)
  • Reliability Engineering
  • Collaborate with service owners and platform teams to enhance service reliability, observability, and fault tolerance.
  • Implement proactive monitoring, alerting, and automated recovery mechanisms.
  • Analyse incident trends and develop reliability improvement plans.
  • Participate in capacity planning, change reviews, and failure mode analysis to anticipate and mitigate risks.
  • Develop and track SLOs/SLIs/SLAs to measure service health and performance.
  • Continuous Improvement
  • Partner with problem management to identify recurring issues and lead root cause elimination initiatives.
  • Automate operational tasks and enhance service recovery using scripts, runbooks, and AIOps tools.
  • Contribute to the evolution of the Major Incident Process, ensuring best practices are embedded across the organization.
  • Key Performance Indicators
  • Mean Time to Resolve (MTTR) and Mean Time to Detect (MTTD).
  • Reduction in number and impact of recurring incidents.
  • Adherence to SLA/SLO targets.
  • Completion rate of post-incident actions.
  • Stakeholder satisfaction and transparency during incidents.

About You

  • Experience:
  • 10+ years' experience.
  • Preferred Certifications:
  • ITIL v4 or Service Operations certification.
  • SRE Foundation / Practitioner certification.
  • Cloud certifications (AWS, Azure, or GCP).
  • Incident Command System (ICS) or equivalent leadership training in crisis response.
  • Soft skills:
  • Communication (verbal and written).
  • Technical skills:
  • Virtualization
  • Cloud Technologies
  • Database
  • Networking
  • Containerization
  • Automation
  • Middleware/Scheduling
  • Infrastructure as code
  • Languages:
  • English

Pursue progress, discover extraordinary

Better is out there. Better medications, better outcomes, better science. But progress doesn't happen without people – people from different backgrounds, in different locations, doing different roles, all united by one thing: a desire to make miracles happen. So, let's be those people.

At Sanofi, we provide equal opportunities to all regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, or gender identity.

Watch our ALL IN video and check out our Diversity Equity and Inclusion actions at sanofi.com!

null
Skills Required
containerization , proactive monitoring , Incident Management, Cloud Technologies, Database, Networking, Automation, Reliability Engineering, Virtualization
Create a job alert for this search

Incident Management Reliability Engineer • Hyderabad / Secunderabad, Telangana, India

Similar jobs
Platform Reliability Architect

Platform Reliability Architect

Tanla Platforms Limited • Hyderabad, Republic Of India, IN
We are looking for a Senior Site Reliability Engineer (SRE) to ensure high availability, reliability, scalability, and performance of our CPaaS platforms supporting real-time communication services...Show more
Last updated: 23 days ago • Promoted
Reliability Engineer

Reliability Engineer

Birlasoft • hyderabad, India
Job Description: Reliability Sr.Reliability Architect with 8 to 12 years of experience in proactive monitoring, automation, and observability.Skilled in AIOps/MLOps, infrastructure management, and ...Show more
Last updated: 14 days ago • Promoted
IT Incident Management Lead

IT Incident Management Lead

The Hartford India • Hyderabad, Republic Of India, IN
The Service Recovery Lead (Major Incident Lead) is responsible for overseeing end-to-end incident management processes within large-scale enterprise environments.This role ensures swift response, c...Show more
Last updated: 14 days ago • Promoted
Threat Detection and Incident Response Engineer

Threat Detection and Incident Response Engineer

SHI Solutions India Pvt. Ltd. • Hyderabad, Republic Of India, IN
We are looking for an experienced.SOC operations, SIEM management, threat detection, and incident response.Global Security Operations Center (SOC).Vulnerability Management tools.Tenable, Rapid7, Qu...Show more
Last updated: 20 days ago • Promoted
Lead Site Reliability Engineer

Lead Site Reliability Engineer

Concentrix • secunderabad, telangana, in
As a Lead Site Reliability Engineer, you will own the reliability and availability of our production systems.You will champion SRE principles across engineering teams — defining SLOs, managing erro...Show more
Last updated: 30+ days ago • Promoted
PLM Teamcenter

PLM Teamcenter

Tata Consultancy Services • hyderabad, telangana, in
Desired Competencies (Technical/Behavioral Competency).Experience with translating sophisticated functional and technical requirements into detailed architecture and design.Full lifecycle PLM enter...Show more
Last updated: 23 days ago • Promoted
Reliability engineer

Reliability engineer

Birlasoft • Hyderabad, Andhra Pradesh, India
Job Description: Reliability Sr.Reliability Architect with 8 to 12 years of experience in proactive monitoring, automation, and observability.Skilled in AIOps/MLOps, infrastructure management, and ...Show more
Last updated: 14 days ago • Promoted
Senior Analyst - Major Incident

Senior Analyst - Major Incident

Cloud4C Services • hyderabad, telangana, in
Domain - IT Infrastructure / Cloud Infrastructure.Location: Hyderabad (Work From Office).Major Incident Management (MIM) Professional.IT infrastructure and escalation management.The ideal candidate...Show more
Last updated: 2 days ago • Promoted
IT Incident Management Lead

IT Incident Management Lead

Cloud4C Services • Hyderabad, Republic Of India, IN
Domain - IT Infrastructure / Cloud Infrastructure.Location: Hyderabad (Work From Office).Major Incident Management (MIM) Professional.IT infrastructure and escalation management.The ideal candidate...Show more
Last updated: 1 day ago • Promoted
Infrastructure Incident & Alert Engineer

Infrastructure Incident & Alert Engineer

Tanla Platforms Limited • Hyderabad, Republic Of India, IN
What You’ll be Responsible for,.Responsible for 24x7 proactive monitoring of platforms using various tools and email alerts.Incident tracking and handling.Create Incidents based on severity and ass...Show more
Last updated: 23 days ago • Promoted
Service recovery lead (major incident lead)

Service recovery lead (major incident lead)

The Hartford India • Hyderabad, Andhra Pradesh, India
The Service Recovery Lead (Major Incident Lead) is responsible for overseeing end-to-end incident management processes within large-scale enterprise environments.This role ensures swift response, c...Show more
Last updated: 14 days ago • Promoted
Service Recovery Lead (Major Incident Lead)

Service Recovery Lead (Major Incident Lead)

The Hartford India • Hyderabad, Telangana, India
Position Summary The Service Recovery Lead (Major Incident Lead) is responsible for overseeing end-to-end incident management processes within large-scale enterprise environments.This role ensures ...Show more
Last updated: 14 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

UST • hyderabad, telangana, in
SRE Operations Avaloq Support:.Job descriptionRole & responsibilities.Provide production support and troubleshooting for the Avaloq Banking Suite platform, ensuring seamless operations and resolvin...Show more
Last updated: 14 days ago • Promoted
Site Reliability Engineer III [T500-24447]

Site Reliability Engineer III [T500-24447]

McDonald's Global Office in India • hyderabad, telangana, in
One of the world’s largest employers with locations in more than 100 countries, McDonald’s Corporation has corporate opportunities in Hyderabad.Our global offices serve as dynamic innovation and op...Show more
Last updated: 16 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

The Hartford India • hyderabad, telangana, in
Our client is a leader in property and casualty insurance, employee benefits and mutual funds.One of the largest insurers in the United States with many decades of expertise, this company is widely...Show more
Last updated: 21 days ago • Promoted
Senior Identity Security Engineer (Identity Access Management)

Senior Identity Security Engineer (Identity Access Management)

S&P Global • hyderabad, telangana, in
Grade Level (for internal use):.Senior Identity Security Engineer (Identity Access Management).Our Identity Security Engineering team is at the forefront of protecting S&P Global digital infrastruc...Show more
Last updated: 30+ days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Kanerika Inc • hyderabad, telangana, in
Support system reliability, automation, and operational efficiency by developing automation tools, improving monitoring systems, and contributing to infrastructure management.The role focuses on re...Show more
Last updated: 23 days ago • Promoted
Site Reliability Engineer

Site Reliability Engineer

Tanla Platforms Limited • hyderabad, telangana, India
We are looking for a Senior Site Reliability Engineer (SRE) to ensure high availability, reliability, scalability, and performance of our CPaaS platforms supporting real-time communication services...Show more
Last updated: 22 days ago • Promoted