We are seeking a Senior Observability Engineer with strong expertise in Grafana and Python to lead telemetry, monitoring, and automation efforts across our cloud-native infrastructure.
This role is critical in shaping our observability strategy, building real-time dashboards, and automating alerting pipelines to ensure high system availability and performance.
Key Responsibilities :
- Design, develop, and maintain Grafana dashboards for real-time infrastructure and application monitoring.
- Build and enhance Python-based automation tools for telemetry data processing, health checks, and alerts.
- Integrate observability solutions with Azure Monitor, Log Analytics, Prometheus, and OpenTelemetry.
- Define and implement SLIs, SLOs, and proactive alerting mechanisms.
- Collaborate with SREs, DevOps, and developers to improve monitoring coverage and incident response.
- Contribute to infrastructure automation and CI / CD workflows using Python, Git, and DevOps tools.
- Lead tool selection, observability best practices, and adoption across engineering teams.
Requirements :
5+ years of experience in observability, DevOps, or SRE roles.Strong hands-on experience with Grafana, including templating, alerting, and data source integration.Proficient in Python scripting for automation and data processing.Experience with Prometheus, Azure Monitor, Log Analytics, and Kubernetes.Familiarity with distributed systems, tracing, and telemetry pipelines.Exposure to tools like Loki, OpenTelemetry, ArgoCD, or Terraform is a plus.Nice to Have :
Experience with CI / CD pipelines (Jenkins, Azure DevOps, GitHub Actions).Knowledge of containerized environments (Docker, Kubernetes, AKS).Ability to design cost-efficient monitoring solutions and dashboards.Benefits :
Fun, happy and politics-free work culture built on the principles of lean and self-organisation;Work with large scale systems powering global businesses;Competitive salary and benefits.(ref : hirist.tech)