Role Overview
We are seeking an experienced L3 Infrastructure Engineer to join our IT Operations team with a focus on Major Incident Management (MIM), incident request management, and rapid response for Priority-1 (P1) incidents. This role requires deep technical expertise in server, network, and virtualization technologies, strong troubleshooting skills, and the ability to lead incident resolution under pressure. The ideal candidate thrives in high-stakes environments, ensuring service restoration and business continuity for critical systems.
Key Responsibilities
- Act as primary L3 escalation point for server-related Major Incidents (MIM) and P1 incidents
- Lead technical bridge calls / war rooms, coordinating with L2 support, vendors, and cross-functional teams
- Perform advanced troubleshooting across Windows, Linux, and virtualization platforms (VMware, etc.)
- Manage recovery of critical infrastructure services
- Ensure root cause analysis (RCA) and detailed post-incident reports are delivered
- Partner with Problem Management to identify proactive measures and prevent recurrence
- Follow ITIL standards for Incident, Problem, and Change Management
- Provide knowledge transfer and guidance to L1 / L2 support teams
Qualifications
5–8 years of experience in server administration and incident responseStrong background in Windows Server (2016 / 2019 / 2022) and Linux (RHEL, Ubuntu, CentOS)Expertise with VMware ESXi / vSphere, networking fundamentals, and enterprise server infrastructuresSolid understanding of networking fundamentals, storage systems, and backup / recovery toolsExperience with monitoring platformsFamiliarity with ServiceNow or other ITSM tools for incident tracking and escalationProven ability to handle high-severity incidents with calm and structured communicationITIL Foundation (required); ITIL Intermediate / Expert or related certifications (preferred)Microsoft, VMware, or Linux certifications (MCSE, VCP, RHCE) are a plusKey Competencies
Crisis management : Able to lead in high-pressure, time-critical situationsCollaboration : Works effectively with vendors, internal teams, and stakeholdersAnalytical mindset : Strong troubleshooting and RCA capabilitiesCommunication : Clear, concise updates during incidents, tailored for both technical and business audiencesProactive mindset : Focus on prevention and service improvement, not just resolutionFirst 2 months are probationary to a yearly renewals of contract engagementWhy Join Us?
Opportunity to play a critical role in keeping enterprise services runningBe part of a world-class IT operations team delivering 24x7 global supportExposure to modern IT ecosystems across cloud, virtualization, and hybrid infrastructures