Job Role : Sr DevOps – Observability and Monitoring
Experience : 10+ Years
Location : Mumbai (Onsite)
About the Role :
We are seeking an experienced Senior DevOps Observability and Monitoring Lead to design, implement, and manage comprehensive monitoring and observability solutions across our cloud and on-premise infrastructure. The role focuses on ensuring system reliability, performance, and proactive incident management through advanced monitoring, alerting, and observability strategies.
Key Responsibilities :
- Lead the design, deployment, and maintenance of observability frameworks across applications and infrastructure.
- Implement and manage monitoring, logging, tracing, and alerting solutions using tools such as Prometheus, Grafana, ELK Stack, Datadog, Splunk, or equivalent.
- Collaborate with development, QA, and operations teams to ensure performance, availability, and reliability of critical systems.
- Define and enforce best practices for monitoring, incident management, and observability across the organization.
- Develop dashboards, metrics, and reports to provide actionable insights to stakeholders.
- Implement automated alerting, anomaly detection, and root cause analysis processes.
- Optimize monitoring solutions for scalability, performance, and cost-efficiency.
- Mentor junior engineers and promote a culture of proactive system health and observability.
- Evaluate and recommend new tools and technologies to enhance observability and monitoring capabilities.
Key Skills and Qualifications :
10+ years of experience in DevOps, cloud infrastructure, and observability / monitoring roles.Strong hands-on experience with monitoring and observability tools (Prometheus, Grafana, ELK Stack, Datadog, Splunk, New Relic).Solid understanding of cloud platforms (AWS, Azure, GCP) and hybrid infrastructure.Experience with logging, tracing, and metrics collection for large-scale distributed systems.Strong scripting and automation skills ( Python, Bash, PowerShell ) for monitoring and alerting workflows.Knowledge of CI / CD pipelines, containerization (Docker), and orchestration (Kubernetes) is a plus.Excellent problem-solving, leadership, and stakeholder management skills.Proven experience in defining observability strategies and leading monitoring initiatives in enterprise environments.