Position : Site Reliability Engineer (SRE)
Experience : 4 – 10 Years
Location : Chennai (Hybrid – 2 days in office)
Role Overview :
We are seeking a Site Reliability Engineer (SRE) responsible for leading reliability practices, ensuring scalable systems, and collaborating with development teams to maintain highly available services.
Key Responsibilities
- Design, build, and operate reliable, scalable production services.
- Define and implement SLIs and SLOs.
- Lead incident management, root cause analysis, and postmortems.
- Automate infrastructure and operations using IaC tools.
- Enhance CI / CD workflows, observability, and monitoring.
- Collaborate with developers and platform engineers to implement best practices.
- Provide technical leadership in reliability strategies.
Requirements
4+ years of experience in SRE, DevOps, or Infrastructure Engineering.Strong coding experience (Go, Python, Java, Rust, or similar).Experience with Kubernetes in production environments.Expertise in Infrastructure as Code (Terraform, Crossplane).Hands-on with CI / CD (ArgoCD, CircleCI, GitHub Actions).Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, OpenTelemetry).Strong problem-solving and leadership skills.Preferred Qualifications
Experience promoting SRE best practices.Deep understanding of microservices architecture.Proficiency in Go, Python, or Bash scripting.Contributions to CNCF or open-source projects.