Key Deliverables :
- Monitor platform health, performance, utilization, and customer KPIs using tools like Dynatrace, SolarWinds, Grafana.
- Ensure end-to-end incident management, RCA documentation, and service request fulfillment.
- Implement and configure smart alerts and monitoring tools across application and infrastructure layers.
- Drive continuous process improvements, build SOPs, and maintain adherence to SLAs.
Role Responsibilities :
Serve 24x7 with the team for proactive problem resolution and escalation handling.Triage and log incidents, initiate bridge calls, and notify relevant stakeholders for major issues.Publish platform health dashboards and maintain incident lifecycle documentation.Design and deploy new monitoring solutions and write automation scripts using Python, Shell, or PowerShell.Skills Required
Noc Operations, Linux, Dynatrace, Sql