About the Role
We are looking for a Cloud Site Reliability Engineer (SRE) with deep expertise in Amazon Web Services (AWS) to design, build, and maintain scalable, reliable, and secure cloud infrastructure. You will work closely with development, DevOps, and operations teams to ensure system uptime, performance, and cost efficiency.
Key Responsibilities
Reliability & Performance :
Design and maintain highly available, fault-tolerant systems on AWS using services like EC2, ECS, EKS, Lambda, RDS, and CloudFront.
Automation & Infrastructure as Code :
Implement and manage infrastructure with Terraform , CloudFormation , or CDK to ensure repeatability and scalability.
Monitoring & Incident Response :
Develop observability solutions using CloudWatch , Prometheus / Grafana , Datadog , or New Relic .
Define SLIs / SLOs / SLAs and manage on-call rotations for incident response and root-cause analysis.
CI / CD & Deployment :
Work with tools like Jenkins , GitHub Actions , AWS CodePipeline , or ArgoCD to build automated pipelines.
Security & Compliance :
Implement best practices for IAM, KMS, VPC security, and compliance (SOC2, ISO 27001, HIPAA, etc.).
Required Qualifications
Bachelor’s degree in Computer Science, Engineering, or equivalent experience
3–7+ years of experience in SRE, DevOps, or Cloud Infrastructure roles
Strong hands-on experience with AWS services (EC2, S3, RDS, Lambda, CloudFormation, etc.)
Proficiency in Terraform or other IaC tools
Strong scripting skills in Python , Bash , or Go
Experience with Kubernetes and container orchestration
Familiarity with monitoring, logging, and alerting systems
Understanding of networking, DNS, load balancing, and security principles
Site Reliability Engineer • Mysore, Karnataka, India