Role Responsibilities :
- Monitor system performance using tools like PagerDuty and Graylog to ensure uptime.
- Troubleshoot incidents in real time and perform root cause analysis to avoid recurrence.
- Oversee deployments and CI / CD pipeline execution for system updates.
- Collaborate with cross-functional teams for efficient release and issue resolution.
Key Deliverables :
Maintain infrastructure stability and reduce system downtime.Drive proactive system performance improvements and monitoring strategies.Document resolutions and standard operating procedures for internal use.Mentor junior engineers and share knowledge on best practices.Skills Required
Linux, Change Management, Incident Management, Automation