Company Description
Nextbridge IT Solutions is a US-based IT solution firm specializing in connecting exceptional talent with organizations driving transformation in infrastructure, cloud, and emerging technologies. We partner closely with clients to understand their technical needs and organizational goals, delivering tailored solutions through highly skilled professionals. Our culture values forward-thinking, accountability, and agility, encouraging continuous growth and supporting long-term success. Join us to shape the future together.
Role Description
This is a remote contract role for a L3 Server Engineer – Major Incident Management. The L3 Server Engineer will be responsible for managing and resolving major incidents, providing expert troubleshooting, and ensuring uptime and performance of infrastructure. Duties include handling operating systems, supporting databases, and overseeing overall IT infrastructure. The role also requires effective communication and collaboration with other IT professionals and stakeholders to ensure swift resolution of incidents.
Key Responsibilities
- Serve as the primary technical escalation point for all server-related Major Incidents (MIM) and P1 events.
- Lead technical triage on bridge calls and in war rooms, coordinating efforts between L2 support, application teams, vendors, and other cross-functional stakeholders.
- Perform advanced, real-time troubleshooting to diagnose and resolve complex issues across Windows Server, Linux, and VMware virtualization platforms.
- Drive the restoration of critical infrastructure services with a focus on minimizing business impact.
- Author and deliver comprehensive Root Cause Analysis (RCA) and detailed post-incident reports.
- Partner with the Problem Management team to identify trends, implement proactive solutions, and prevent incident recurrence.
- Mentor and provide technical guidance to L1 / L2 support teams to improve overall incident response capabilities.
- Participate in a 24x7 on-call rotation to provide critical support when needed.
Qualifications
5–8 years of hands-on experience in enterprise server administration and high-severity incident response.Expert-level knowledge of Windows Server (2016 / 2019 / 2022) and Linux (RHEL, Ubuntu) .Deep expertise with virtualization technologies, specifically VMware ESXi / vSphere in a large-scale environment.Solid understanding of core infrastructure concepts : TCP / IP networking, SAN / NAS storage, and enterprise backup / recovery solutions .Hands-on experience with enterprise monitoring platforms (e.g., SolarWinds, Datadog, Nagios).Proficiency with an ITSM tool, preferably ServiceNow, for incident lifecycle management.Demonstrated ability to remain calm, focused, and organized during high-pressure situations.ITIL v3 / v4 Foundation certification is required.Preferred
Advanced certifications such as MCSE, VCP, RHCE .ITIL Intermediate / Expert or related certifications.Experience with public cloud platforms ( Azure, AWS ) and hybrid cloud environments.Scripting and automation skills ( PowerShell, Bash ) for diagnostics and reporting.Key Competencies
Crisis Management : Able to lead effectively in high-pressure, time-critical situations.Collaboration : Works seamlessly with vendors, internal teams, and stakeholders to achieve common goals.Analytical Mindset : Possesses superior troubleshooting and Root Cause Analysis (RCA) capabilities.Communication : Delivers clear, concise, and timely updates during incidents, tailored for both technical and business audiences.Proactive Mindset : Focuses on prevention and continuous service improvement, not just reactive resolution.Remote Work Environment
This is a fully remote position. A company-provided Virtual Desktop (VDI) will be used for all work.Candidates are expected to provide their own reliable computer (laptop or desktop) and at least one monitor capable of accessing the VDI.A dedicated and quiet workspace is essential to maintain a professional environment during critical incident bridge calls.