Position :
Site Reliability Engineer (SRE)
Experience : 4 – 10 Years
Location :
Chennai (Hybrid – 2 days in office)
Role Overview :
We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.
Key Responsibilities
Design, build, and operate reliable, scalable production services.
Define and implement SLIs and SLOs.
Lead incident management, root cause analysis, and postmortems.
Automate infrastructure and operations using IaC tools.
Enhance CI / CD workflows, observability, and monitoring.
Collaborate with developers and platform engineers to implement best practices.
Provide technical leadership in reliability strategies.
Requirements
4+ years of experience in SRE, DevOps, or Infrastructure Engineering.
Strong coding experience (Go, Python, Java, Rust, or similar).
Experience with Kubernetes in production environments.
Expertise in Infrastructure as Code (Terraform, Crossplane).
Hands-on with CI / CD (ArgoCD, CircleCI, GitHub Actions).
Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, OpenTelemetry).
Strong problem-solving and leadership skills.
Preferred Qualifications
Experience promoting SRE best practices.
Deep understanding of microservices architecture.
Proficiency in Go, Python, or Bash scripting.
Contributions to CNCF or open-source projects.
Site Reliability Engineer • India