Major Incident Manager (Escalation Management Team)
Location : Hyderabad
Experience : 8-15 years
Immediate Joiner preferred.
Kindly share resume to nsenthil.kumar@genpact.com with Sub of "MIM" along with notice period.
Responsibilities
We are seeking a proactive and skilled Major Incident Manager to join our Escalation Management team. In this critical role, you will lead high-priority incident bridges to ensure rapid service restoration by coordinating with resolver groups and keeping stakeholders informed with timely updates. You will work closely with internal teams across SRE, Business Partners, R&D, Services, Sales, and Support, as well as with customers, to drive resolution of critical technical issues and provide executive-level visibility into incident status and customer impact. This role requires availability during CST hours and includes shift work and / or on-call responsibilities to ensure 24 / 7 incident coverage and timely communication to leadership.
- Serve as the first escalation point for the Event Management team and lead major incident bridges to ensure rapid service restoration.
- Act as the single point of contact for complex, high-priority escalations across global teams.
- Own and drive the end-to-end resolution of major incidents, including coordination with resolver groups and timely stakeholder communication.
- Collaborate with cross-functional teams (R&D, Product Management, Support, Sales, and Services) to troubleshoot issues and allocate appropriate resources.
- Monitor incident progress and ensure alignment with resolution timelines and customer expectations.
- Conduct Post-Incident Reviews, prepare customer facing summaries and internal incident reports to capture lessons learned and drive improvements.
- Own and manage problems, ensuring timely updates, resolution, and closure.
- Partner with Engineering, P&T, and Process Owners to improve service stability and reduce incident recurrence.
- Analyze escalation trends and risks, contributing to the Problem Management lifecycle and continuous service improvement.
- Maintain clear communication with internal and external stakeholders via email and Microsoft Teams.
- Develop and maintain escalation management plans, including resource coordination and technical action plans.
- Initiate hierarchical escalations when necessary and ensure leadership engagement.
- Ensure accurate documentation of escalation activities and compliance with escalation policies.
- Validate customer satisfaction before closure and ensure post-resolution monitoring is completed.
- Provide event management support during low-incident periods.
- Participate in a shared 24x7 on-call rotation to ensure incident coverage and timely response.
- Adhere to the critical service level agreements defined for the project
- Champion a culture of continuous improvement by challenging outdated processes, identifying inefficiencies, and driving structured, actionable plans for enhancement.
- Ensure strict adherence to critical service level agreements (SLAs) and operational standards.
- Demonstrate thought leadership by incorporating industry best practices from leading product and startup environments to enhance incident and major incident management processes.
- Promote the adoption of AI and automation to streamline outage management and improve response efficiency.
- Encourage open communication, proactively raise concerns, and collaborate cross-functionally to resolve systemic issues.
Qualifications
Minimum qualifications
Bachelor's Degree required. Preferably in Computer Science, Information Systems, or related field.
Preferred qualifications
Excellent verbal and written communication skills in English.Relevant years of experience in global Major Incident Management or a similar role, with a strong background in handling incidents across complex technical environments.Working knowledge of infrastructure components such as hypervisors, storage, databases, networking (TCP / IP, iSCSI, VMware VDS), and compute environments on both Windows and Linux platforms.Familiarity with cloud platforms including AWS, Azure, and GCP, with a solid understanding of core cloud and infrastructure concepts.Experience managing major incidents involving cloud services, infrastructure, and enterprise applications.Proficient in ServiceNow (Incident, Problem, Change, and Service Request modules), PagerDuty, Microsoft Teams, Power Automate, New Relic, Harness, and MS Copilot.Understanding of web and application servers (IIS, Apache, Tomcat) and database technologies such as Microsoft SQL Server.Exposure to monitoring tools like AppDynamics, SolarWinds, New Relic, SCOM, Nagios, or Zenoss.Basic scripting skills in PowerShell or similar tools.Hands-on experience with ITSM platforms, preferably ServiceNow.