Job Role
: Sr DevOps – Observability and Monitoring
Experience
: 10+ Years
Location
: Mumbai (Onsite)
About the Role :
We are seeking an experienced
Senior DevOps Observability and Monitoring Lead
to design, implement, and manage comprehensive monitoring and observability solutions across our cloud and on-premise infrastructure. The role focuses on ensuring
system reliability, performance, and proactive incident management
through advanced monitoring, alerting, and observability strategies.
Key Responsibilities :
Lead the design, deployment, and maintenance of
observability frameworks
across applications and infrastructure.
Implement and manage
monitoring, logging, tracing, and alerting solutions
using tools such as Prometheus, Grafana, ELK Stack, Datadog, Splunk, or equivalent.
Collaborate with development, QA, and operations teams to ensure
performance, availability, and reliability
of critical systems.
Define and enforce
best practices for monitoring, incident management, and observability
across the organization.
Develop dashboards, metrics, and reports to provide actionable insights to stakeholders.
Implement automated
alerting, anomaly detection, and root cause analysis
processes.
Optimize monitoring solutions for scalability, performance, and cost-efficiency.
Mentor junior engineers and promote a culture of proactive system health and observability.
Evaluate and recommend new tools and technologies to enhance observability and monitoring capabilities.
Key Skills and Qualifications :
10+ years of experience in DevOps, cloud infrastructure, and observability / monitoring roles.
Strong hands-on experience with
monitoring and observability tools
(Prometheus, Grafana, ELK Stack, Datadog, Splunk, New Relic).
Solid understanding of
cloud platforms
(AWS, Azure, GCP) and hybrid infrastructure.
Experience with
logging, tracing, and metrics collection
for large-scale distributed systems.
Strong scripting and automation skills (
Python, Bash, PowerShell
) for monitoring and alerting workflows.
Knowledge of
CI / CD pipelines, containerization (Docker), and orchestration (Kubernetes)
is a plus.
Excellent problem-solving, leadership, and stakeholder management skills.
Proven experience in defining observability strategies and leading monitoring initiatives in enterprise environments.
Observability • India