Job Title : Site Reliability Engineer (SRE) – AWS
Experience : 8+ years
Location : Chennai / Mumbai
Work Mode : Hybrid
Key Skills : AWS, Terraform, Kubernetes, Docker, Grafana, Prometheus, Datadog
Job Summary :
We are looking for a skilled Site Reliability Engineer (SRE) with strong AWS experience and a solid background in DevOps, automation, observability, and large-scale distributed systems.
Responsibilities :
- Manage and optimize cloud infrastructure using AWS IaaS.
- Implement SRE practices to enhance reliability, performance, and SDLC efficiency.
- Build and maintain CI / CD pipelines (Jenkins, GitLab, Terraform).
- Work with containers and orchestration (Docker, ECS, Kubernetes).
- Troubleshoot performance, networking, and distributed system issues.
- Drive DevOps and QA best practices across teams.
- Implement observability : SLI / SLO, Error Budgets, monitoring, logging, tracing, alerting.
- Lead incident resolution and perform RCA.
- Automate tasks using Python / Bash / PowerShell.
- Collaborate effectively with cross-functional teams with minimal supervision.
Qualifications :
Strong AWS cloud experienceProven DevOps & SRE implementation skillsGood understanding of Linux, networking, and distributed systemsHands-on experience with observability toolsStrong scripting and automation expertiseExcellent communication and teamwork skills