Job Title : AWS Site Reliability Engineer (SRE)
Experience Required : 6–8 years
- Interview Drive Details :
- Date : 13th Aug 2025
- Time : 10 AM to 4 PM
- Mode : In-person - ( Face to Face ).
- Location : Bangalore
Role Overview
As an AWS SRE, you'll leverage DevOps and SRE best practices to build, automate, and maintain scalable, reliable cloud infrastructure. Your focus will be on elevating system performance, observability, and incident response while fostering operational excellence.
Key Responsibilities
Define, monitor, and uphold Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets to guide reliability efforts AvahiGeeksforGeeks.Build and maintain infrastructure resilience through automation (IaC with Terraform, CloudFormation), on-call tooling, and self-healing practices SquareOpsAvahiAmazon Web Services, Inc..Monitor system health using tools like Prometheus, Grafana, Datadog, CloudWatch, and ELK Stack; establish proactive alerts to detect issues before they escalate teamaws.comspoclearn.comAmazon Web Services, Inc..Lead incident response—including detection, troubleshooting, mitigation, and conducting blameless postmortems teamaws.comAvahiGeeksforGeeks.Execute capacity planning and performance optimization to accommodate growth and improve efficiency teamaws.comAvahiAlp Consulting.Collaborate with development and operations teams to embed reliability in software lifecycle and deployments teamaws.comconfigu.com.Optimize costs and performance while maintaining operational effectiveness through AWS-native solutions and observability Alp Consultingspoclearn.com.Support disaster recovery planning, fault tolerance, and ensure compliance with reliability standards.Required Skills And Qualifications
Bachelor's degree in Computer Science, IT, or related field.6–8 years of experience in SRE, DevOps, or infrastructure engineering, with strong exposure to AWS environments.Expert in infrastructure automation (e.g., Terraform, CloudFormation), containerization, and orchestration platforms.Proficient in one or more programming / scripting languages (e.g., Python, Go, Bash).Hands-on experience with monitoring, observability, and incident management tools (e.g., Prometheus, Grafana, CloudWatch, ELK, Datadog).Strong understanding of system design, distributed systems, networking, and performance tuning.Proven track record of managing production systems, incident response, and performing blameless postmortems.Adept at capacity planning, performance benchmarking, and cost optimization.Preferred Qualifications
AWS certifications such as AWS Certified DevOps Engineer or AWS Certified Solutions Architect.Familiarity with container orchestration like EKS / Kubernetes.Experience with on-call practices, runbook development, and SRE methodologies (SLIs / SLOs, error budgets).Exposure to chaos engineering or resilience testing frameworks.Skills Required
Cloudwatch, Terraform, Cloudformation, Prometheus, Go, Bash, Grafana, Elk Stack, Datadog, Python, Aws