Job Description :
Role : Site Reliability Engineer (SRE)
Location : Bangalore / Chennai / Pune (Hybrid)
Experience : 5+ years
Role Overview :
We are looking for a skilled SRE to ensure the reliability, scalability, and performance of our cloud-native applications. The ideal candidate has hands-on experience in cloud environments, container orchestration, infrastructure automation, and observability.
Key Responsibilities :
- Build, manage, and optimize cloud infrastructure (AWS, Azure, GCP).
- Deploy and manage containerized applications using Kubernetes (EKS / GKE / AKS) and Docker.
- Implement Infrastructure as Code (IaC) using Terraform, Ansible, and Helm.
- Set up monitoring, logging, and alerting solutions; implement distributed tracing, metrics collection, and log aggregation.
- Define and maintain SLOs, SLIs, and SLAs to measure and improve system reliability.
- Automate operational tasks, incident response, and workflows using Python, Bash, or Shell scripts.
- Collaborate with cross-functional teams to ensure high availability, scalability, and security of production systems.
Required Skills :
Strong experience in cloud platforms : AWS, Azure, GCP.Hands-on with containerization & orchestration : Kubernetes, Docker.Expertise in IaC tools : Terraform, Ansible, Helm.Monitoring and observability tools : Prometheus, Grafana, ELK Stack, CloudWatch, Datadog.Solid understanding of SRE principles : SLIs, SLOs, SLAs, error budgets.Automation and scripting experience (Python, Bash, Shell).Good-to-Have :
Multi-cloud exposure.Experience with GitOps and CI / CD pipelines.Knowledge of security and compliance in cloud environments.(ref : hirist.tech)