Join Verdantas – A Top #ENR 81 Firm!
Job Description : ITIL Problem Manager
Job Summary
The ITIL Problem Manager is a critical role within the IT Service Management (ITSM) framework, focused on identifying the root causes of incidents and preventing their recurrence. Unlike Incident Management, which is concerned with restoring service quickly, Problem Management is analytical and proactive. The Problem Manager leads the effort to minimize the adverse impact of incidents and problems on the business caused by errors within the IT infrastructure and to prevent the recurrence of incidents related to these errors. This role requires a unique blend of technical aptitude, analytical thinking, and excellent communication skills.
Key Roles and Responsibilities
The responsibilities of a Problem Manager can be divided into two main areas : Reactive and Proactive Problem Management.
A. Reactive Problem Management (Solving the Past)
This involves managing problems that have been identified through one or more incidents.
- Problem Identification & Logging :
- Identify and log problems based on incident data, trends, and analysis.
- Receive inputs from Major Incident reports, technical teams, and the Service Desk.
- Ensure all problems are recorded with all necessary details in the Problem Management tool.
- Problem Categorization & Prioritization :
- Categorize problems to identify trends and areas for improvement.
- Prioritize problems based on their impact, urgency, and severity on business operations, often using a risk assessment matrix.
- Root Cause Analysis (RCA) Facilitation :
- Lead and facilitate Root Cause Analysis (RCA) sessions using techniques like 5 Whys, Ishikawa (Fishbone) Diagrams, Pareto Analysis, and Fault Tree Analysis .
- Bring together the correct technical teams, stakeholders, and subject matter experts to collaboratively diagnose the root cause.
- Workaround Identification :
- Work with technical teams to identify and document effective workarounds for known errors until a permanent fix is implemented.
- Ensure workarounds are communicated to the Service Desk and incorporated into Known Error Records.
- Known Error Management :
- Create and maintain Known Error Records (KERs) in the Known Error Database (KEDB).
- Ensure KERs contain a clear description of the error, its root cause, and any workarounds or solutions.
- Resolution & Change Management :
- Propose and coordinate the implementation of permanent fixes or solutions to resolve the root cause.
- Raise Requests for Change (RFCs) and work closely with the Change Manager to ensure fixes are implemented safely and effectively.
B. Proactive Problem Management (Preventing the Future)
This involves seeking to identify and solve problems before incidents occur.
Trend Analysis :Analyze incident, event, and monitoring data to identify potential problems or weaknesses in the IT infrastructure before they cause significant incidents.Perform trend analysis on recurring incidents to identify underlying problems.Risk Assessment :Identify components in the IT environment that are at risk of failing and proactively address them.Work with Capacity, Availability, and IT Service Continuity Management to prevent problems.Preventive Action :Initiate preventive actions, such as software updates, patches, or hardware replacements, to avoid future incidents.Contribute to the design of new services to ensure lessons learned from past problems are incorporated.C. Reporting & Communication
Management Reporting :Produce regular management reports on Problem Management performance, including key metrics like :Percentage of problems resolved within SLA.Backlog of open problems.Reduction in recurring incidents.Cost savings from prevented incidents.Report on the effectiveness of major problem resolutions and the ROI of proactive initiatives.Stakeholder Communication :Communicate problem status, root causes, and resolution plans to key stakeholders, including management and business units.Act as the central point of communication for major problem investigations.Process Ownership :Maintain and continually improve the Problem Management process, policies, and procedures.Ensure the process is aligned with ITIL best practices and integrated with other ITSM processes (Incident, Change, Knowledge Management).Required Skills and Qualifications
Essential Skills :
Analytical & Problem-Solving : Exceptional analytical skills with the ability to think logically and methodically to diagnose complex issues.Root Cause Analysis : Proven experience in leading RCA sessions and utilizing various RCA techniques.Communication : Excellent verbal and written communication skills, with the ability to explain technical issues and their business impact to non-technical stakeholders.Influence & Facilitation : Strong facilitation and influencing skills to lead cross-functional teams without direct managerial authority.ITSM Knowledge : In-depth understanding of the ITIL framework, specifically the Problem Management process and its interfaces with Incident, Change, and Knowledge Management.Documentation : Meticulous attention to detail for creating and maintaining accurate problem records, Known Error Records, and reports.Qualification :
ITIL 4 Foundation certification is mandatory. ITIL 4 Specialist : Create, Deliver & Support or ITIL 4 Strategist : Direct, Plan & Improve are highly desirable.5 years of experience in an IT Service Management role, with at least 2 years specifically in Problem Management or a similar analytical role (e.G., Major Incident Manager, Service Desk Analyst with RCA duties)Practical experience with ITSM platforms like ServiceNow, BMC Helix, Jira Service Management, etc.