This job offer is not available in your country.

Senior Site Reliability Engineer

ConfidentialDelhi, Mumbai, Kolkata

16 days ago

Job description

Build products with MVRs and reliability standards , ensuring system resilience and scalability.

Set up and operate observability tools across multiple cloud providers, incorporating AI-powered anomaly detection to enhance monitoring.

Assist development teams in defining SLO / SLI dashboards and alerts , optimizing alerting signals with ML-based noise reduction techniques .

Use Go, Python, or Terraform to automate operational tasks and build self-healing mechanisms.

Manage and administer Grafana, Prometheus, Loki, and other observability tools , integrating predictive analytics where beneficial.

Troubleshoot and support production environments , using AI-assisted diagnostics where applicable for faster root cause identification.

Automate incident response workflows, leveraging AIOps to reduce manual toil and improve MTTR.

What Youll Need to be Successful

Minimum of 5 years experience in a SaaS environment .
Bachelors degree or equivalent experience.
Ability to participate in an on-call rotation .
Strong understanding of networking (OSI model, TCP / IP, DNS), particularly in cloud environments .
Experience with Linux administration, security hardening, and performance tuning .
Passion for troubleshooting distributed systems and software failures.
Deep understanding of observability principles , including log analysis, tracing, and metrics correlation .
Strong background in infrastructure as code (Terraform, Pulumi) and container orchestration (Kubernetes, ECS, Nomad) .
Interest in AI-powered automation , including AIOps tools, ML-based alert tuning, and predictive maintenance .
Experience with Observability tools like Prometheus,grafana or OpenTelemetry with ML-based anomaly detection is a plus.
Excellent technical writing skills for documenting architectures, processes, and automation workflows

Skills Required

Terraform, Saas, Kubernetes, Incident Response

Senior Site Reliability Engineer • Delhi, Mumbai, Kolkata