Job Overview :
We are seeking a highly motivated and skilled IT Operations Analyst (L2 Support) to join our team and work on a critical project for our globally recognized consulting client. In this role, you will be responsible for providing technical support and operational expertise for their IT infrastructure and applications, with a strong focus on utilizing and managing EMS (Event Management Systems) tools. You will play a key role in proactively monitoring systems, identifying and resolving incidents, performing root cause analysis, and
contributing to the overall stability and efficiency of their IT :
- Act as the second level of escalation for IT incidents, providing timely and effective technical support to resolve complex issues related to infrastructure, applications, and EMS tools.
- Analyze, diagnose, and troubleshoot incidents, aiming for first-time resolution whenever possible.
- Escalate incidents to L3 support or relevant teams when necessary, providing detailed information and context.
- Track and manage incidents through their lifecycle, ensuring timely updates and closure according to SLAs.
- Utilize and manage EMS tools (SolarWinds, Nagios, Splunk, Dynatrace, SCOM) to proactively monitor the health and performance of IT infrastructure, applications, and services.
- Configure and maintain monitoring rules, alerts, and dashboards within the EMS tools.
- Identify and analyze events, correlate them to potential incidents, and take proactive steps to prevent disruptions.
- Respond to alerts and notifications from EMS tools in a timely and efficient manner.
- Participate in root cause analysis (RCA) for recurring or critical incidents, identifying underlying issues and contributing to the development of permanent solutions.
- Implement and track corrective and preventive actions to minimize future incidents.
- Contribute to the knowledge base by documenting known issues, troubleshooting steps, and resolutions.
- Participate in the change management process by reviewing change requests, assessing potential impact, and providing technical input.
- Assist in the implementation and rollback of changes as required.
- Fulfill service requests related to IT operations, adhering to established procedures and SLAs.
- Create and maintain accurate and up-to-date documentation, including incident reports, troubleshooting guides, operational procedures, and knowledge base articles.
- Generate regular reports on system health, incident trends, and performance metrics using EMS tools and other reporting mechanisms.
- Collaborate effectively with other IT teams (L1 support, L3 support, development, networking, security) to resolve incidents and address operational issues.
- Communicate clearly and concisely with end-users and stakeholders regarding incident status, resolution progress, and planned maintenance activities.
- Identify opportunities for process improvement and automation within IT operations.
- Contribute to the development and implementation of automation scripts and tools to streamline routine tasks.
- Stay updated with the latest trends and best practices in IT operations and EMS Skills & Experience :
- Mandatory : 3- 6 years of hands-on experience in IT Operations, providing L2 support for enterprise-level IT infrastructure and applications.
- Mandatory : Proven experience in utilizing and managing at least one or more enterprise-grade EMS (Event Management Systems) tools such as :
1. SolarWinds
2. Nagios
3. Splunk
4. Dynatrace
5. SCOM (System Center Operations Manager)
6. Other similar tools
Strong understanding of ITIL framework concepts, particularly Incident Management, Problem Management, and Change Management.Solid understanding of operating systems (Windows Server, Linux).Basic understanding of networking concepts (TCP / IP, DNS, DHCP).Familiarity with monitoring concepts and methodologies.Experience with scripting languages (PowerShell, Bash, Python) for automation and troubleshooting is a plus.Basic knowledge of cloud platforms (AWS, Azure) is a plus.Experience with ticketing systems (ServiceNow, Jira Service Management).Excellent analytical and problem-solving skills with a systematic approach to troubleshooting.Strong attention to detail and the ability to follow procedures and document work accurately.Excellent communication (written and verbal) and interpersonal skills, with the ability to communicate technical information to both technical and non-technical audiences.Ability to work independently and as part of a team in a fast-paced environment.Willingness to work in shifts and provide on-call support as :B.E. / B.Tech in Computer Science or Information Technology from a recognized Period :Immediate to 15 days (Candidates with a notice period of less than 30 days are preferred).ref : hirist.tech)