Job Description
Job Description :
As a Lead SRE, you will architect, implement, and manage cloud and data reliability strategies across large-scale applications and client environments. This role emphasizes system scalability, resilience, automation, and mentoring junior engineers.
Responsibilities :
- Lead design and implementation of reliability frameworks for applications and data pipelines.
- Architect and secure scalable cloud infrastructure (AWS / Azure).
- Drive incident management, service availability, and disaster recovery planning.
- Automate provisioning and configuration using Terraform, Ansible, or CloudFormation.
- Oversee monitoring systems, SLO / SLI tracking, and cost optimization.
- Mentor junior SREs and collaborate with cross-functional engineering and data teams.
Requirements
Skills & Tools :
5–8 years of SRE / DevOps / Data Infrastructure experience.Expert in AWS (EC2, VPC, RDS, Lambda, S3) or Azure equivalents.Mastery in Terraform, Jenkins, Kubernetes, and scripting languages (Python, Bash).Exposure to big data reliability systems (Kafka, Airflow, Spark).Eligibility :
Bachelor’s / Master’s in Computer Science or Engineering.Experience leading reliability projects and ensuring SLA compliance in enterprise environments.Requirements
Skills & Tools : 5–8 years of SRE / DevOps / Data Infrastructure experience. Expert in AWS (EC2, VPC, RDS, Lambda, S3) or Azure equivalents. Mastery in Terraform, Jenkins, Kubernetes, and scripting languages (Python, Bash). Exposure to big data reliability systems (Kafka, Airflow, Spark). Eligibility : Bachelor’s / Master’s in Computer Science or Engineering. Experience leading reliability projects and ensuring SLA compliance in enterprise environments.