Job Overview
We are seeking an exceptional Senior IT Specialist to manage and resolve major incidents, provide expert troubleshooting, and ensure uptime and performance of our infrastructure.
- This fully remote position involves handling operating systems, support databases, and overseeing overall IT infrastructure.
- The selected candidate will serve as the primary technical escalation point for all server-related Major Incidents (MIM) and P1 events.
Key Responsibilities
Lead technical triage on bridge calls and in war rooms, coordinating efforts between teams to ensure swift resolution of incidents.Perform advanced, real-time troubleshooting to diagnose and resolve complex issues across Windows Server, Linux, and VMware virtualization platforms.Drive the restoration of critical infrastructure services with a focus on minimizing business impact.Author and deliver comprehensive Root Cause Analysis (RCA) and detailed post-incident reports.Mentor and provide technical guidance to junior teams to improve overall incident response capabilities.Participate in a 24x7 on-call rotation to provide critical support when needed.Qualifications
5–8 years of hands-on experience in enterprise server administration and high-severity incident response.Expert-level knowledge of Windows Server and Linux, including RHEL and Ubuntu.Deep expertise with virtualization technologies, specifically VMware ESXi / vSphere in a large-scale environment.Solid understanding of core infrastructure concepts : TCP / IP networking, SAN / NAS storage, and enterprise backup / recovery solutions.Hands-on experience with enterprise monitoring platforms such as SolarWinds, Datadog, and Nagios.Proficiency with an ITSM tool, preferably ServiceNow, for incident lifecycle management.Preferred Qualifications
Advanced certifications such as MCSE, VCP, and RHCE.Experience with public cloud platforms and hybrid cloud environments.Scripting and automation skills for diagnostics and reporting.Requirements
To succeed in this role, you should have :
Crisis Management : Lead effectively in high-pressure situations.Collaboration : Work seamlessly with vendors, internal teams, and stakeholders.Analytical Mindset : Possess superior troubleshooting and RCA capabilities.Communication : Deliver clear, concise updates during incidents.Proactive Mindset : Focus on prevention and continuous service improvement.About the Role
This is a fully remote position with a company-provided Virtual Desktop (VDI).
A dedicated and quiet workspace is essential for maintaining a professional environment during critical incident bridge calls.
You will be expected to provide your own reliable computer (laptop or desktop) and at least one monitor capable of accessing the VDI.