Talent.com
This job offer is not available in your country.
Incident Manager

Incident Manager

TalentojKottayam, IN
30+ days ago
Job description

Roles and Responsibilities :

  • Act as the primary point of contact for major incidents and escalations, ensuring rapid response and communication across technical and business teams.
  • Lead and coordinate incident resolution efforts involving multiple support teams and stakeholders to restore service as quickly as possible.
  • Manage the end-to-end incident lifecycle – detection, logging, categorization, prioritization, resolution, and closure.
  • Conduct detailed Root Cause Analysis (RCA) for high-severity incidents and drive implementation of permanent fixes.
  • Work closely with AWS cloud infrastructure teams to identify and resolve platform-level or configuration issues.
  • Collaborate with architecture and development teams to identify patterns, improve system reliability, and strengthen incident prevention strategies.
  • Develop and maintain incident management processes, playbooks, and metrics to improve response efficiency and reduce recurrence.
  • Manage communications and stakeholder expectations during critical incidents and post-incident reviews.
  • Participate in on-call rotations and ensure 24x7 support coverage as required.
  • Continuously drive improvements in monitoring, alerting, and automation to minimize incident impact and MTTR (Mean Time to Recovery).

Required Skills & Qualifications :

  • 8–14 years of experience in Incident Management / Production Support / Site Reliability / IT Operations roles.
  • Strong experience in managing incidents within complex distributed architectures and cloud-based environments (AWS preferred).
  • Expertise in AWS services such as EC2, S3, Lambda, CloudWatch, RDS, and related monitoring and logging tools.
  • Exposure to Redis and Elasticsearch for cache management, data indexing, and performance optimization.
  • Excellent communication and coordination skills to handle high-pressure situations and interact with senior stakeholders.
  • Proven ability to perform Root Cause Analysis (RCA) and implement corrective and preventive measures.
  • Experience with ITIL processes (Incident, Problem, Change Management).
  • Familiarity with tools such as ServiceNow, Jira, CloudWatch, PagerDuty , etc.
  • Strong analytical and problem-solving skills with a proactive approach to issue resolution.
  • Ability to work in 24x7 production support environments and handle critical incident escalations effectively.
  • Create a job alert for this search

    Incident Manager • Kottayam, IN