Job Details :
Job Title : Lead Site Reliability Engineer (SRE)
Duration : Contract to Hire (On the Payroll of Datum Technology Group)
Location : Chennai || Mumbai || Gurugram
Interview Process : Virtual (2 Rounds) + 1 Technical screening.
Job Description :
- We are seeking a highly skilled and experienced Lead Site Reliability Engineer (SRE) to drive reliability, scalability, and performance across our cloud infrastructure, with a strong emphasis on cloud security, compliance, networking, and operating systems expertise.
- This role blends reliability engineering with security best practices to ensure our cloud infrastructure is not only scalable and resilient but also secure and compliant.
Responsibilities :
Develop and maintain Infrastructure as Code (IaC) using Terraform , including advanced module design and best practices for highly complex environments.Design and optimize CI / CD pipelines with a focus on automation, scalability, and deployment efficiency. Ability to discuss and implement pipeline optimizations from prior experience.Collaborate with development teams to integrate security and observability tools into CI / CD pipelines, automating security checks.Troubleshoot and debug networking issues, including deep understanding of networking layers, components, and configurations across cloud and hybrid environments.Administer and optimize Linux-based operating systems , including troubleshooting, performance tuning , and implementing best practices for security and reliability.Address vulnerabilities in code libraries and infrastructure (e.g., OS packages) through patching and remediation.Partner with application teams to resolve specific security findings and improve overall system resilience.Requirements :
9+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Engineering.Some experience into leading or managing a team of engineers.Deep knowledge of networking fundamentals, Linux operating systems, and CI / CD optimization strategies.Very strong expertise in writing complex Terraform code , including advanced module design and best practices for large-scale, highly complex environments.Proficiency in scripting or programming languages (e.g., Python, Bash, Go).Hands-on experience with Azure cloud platformBonus / Preferred Skills :
Experience with Docker and Kubernetes for containerization and orchestration.