Note : If shortlisted, you will be invited for initial rounds on 6th December'25 (Saturday) in :
We are seeking an experienced Incident Manager to lead end-to-end response for high-severity incidents across our global SaaS platforms. This role acts as the single point of coordination during outages, ensuring timely triage, clear stakeholder communication, rapid service restoration, and strong post-incident governance aligned to ITIL. The ideal candidate combines calm, decisive leadership with solid technical acumen in cloud-native environments.
Key Responsibilities :
- Act as Incident Commander for Sev1 / Sev2 events; run bridges / war rooms, drive parallel workstreams, and ensure clear decision logs and ownership.
- Assess business impact, prioritize recovery actions, and coordinate across Dev / SRE, Platform, Security, and Vendor teams.
- Issue timely internal / external communications (initial, updates, RCA / PIR) and maintain executive?ready status dashboards during incidents.
- Define and maintain Incident, Change, and Problem workflows, SOPs, and runbooks; ensure ITIL alignment and continuous improvement.
- Partner with Change Management to review risk, quality gates, and change freezes; reduce repeat incidents and change?related failures.
- Oversee access management during incidents (break?glass, least privilege) and conduct post?event access reviews.
- Own weekly / monthly reporting (KPIs / SLAs / SLOs, trend analysis, recurring faults) and drive corrective actions with owners and deadlines.
- Manage stakeholders and customers with calm, credible updates, action plans, and clear expectations.
- Capture lessons learned, update knowledge articles / runbooks, and coach teams on best practices.
Required Qualifications :
Bachelors degree in IT, Computer Science, or related field.6+ years in Incident Management within 247 global operations.ITIL Foundation certification (required); ITIL Intermediate / Managing Professional is a plus.Proven experience leading major incidents with multi team coordination and executive communication.Excellent written and verbal communication; able to articulate complex issues to technical and non?technical audiences.Strong analysis, prioritization, and decision?making under pressure.Must Have Technical Knowledge :
Microsoft Azure (Monitor, Log Analytics / App Insights, core Kubernetes / AKS operations fundamentals.Windows & Linux operational basics.ServiceNow (Incident / Change / Problem) for ITSM processes.PagerDuty for on call management and escalation policies.Salesforce for customer case and communication workflows.Key Skills & Competencies :
Customer centric, process?driven, and results oriented.Strong stakeholder management; ability to influence without authority.Structured, detail oriented, and excellent facilitation / bridge leadership.Comfortable working across time zones in a fast?moving environment.KPIs Owned :
MTTA / MTTR; incident volume and severity distribution.SLA / SLO adherence; repeat incident rate; problem backlog burn?down.Change failure rate linked to incidents; time to RCA / PIR closure.Timeliness and quality of communications; runbook coverage and freshness.Desirable :
Deeper Azure / Kubernetes / SRE experience (scaling, resiliency, observability).Advanced ITIL certifications (Change, Problem, Service Operations).Familiarity with monitoring / observability stacks (Prometheus / Grafana, Azure Monitor).Additional Information :
Weekend and on-call support may be required on a rotational basis.Coordination with customers and teams across multiple regions / time zones.(ref : hirist.tech)