Job Details :
Job Title : Sr. Site Reliability Engineer (SRE)
Duration : Contract to Hire (On the Payroll of Datum Technology Group)
Location : Chennai || Mumbai || Gurugram
Interview Process : Virtual (2 Rounds) + 1 Technical screening.
Job Description :
We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to enhance reliability, scalability, and performance across our cloud infrastructure, with a strong emphasis on cloud security, compliance, networking, and Linux operating systems expertise .
This role combines reliability engineering with security best practices to ensure our cloud infrastructure is resilient, secure, and compliant.
Responsibilities :
Develop and maintain Infrastructure as Code (IaC) using Terraform , including advanced module design and best practices for highly complex environments.
Design and optimize CI / CD pipelines with a focus on automation, scalability, and deployment efficiency. Ability to discuss and implement pipeline optimizations from prior experience.
Collaborate with development teams to integrate security and observability tools into CI / CD pipelines, automating security checks.
Address vulnerabilities in code libraries and infrastructure (e.g., OS packages) through patching and remediation.
Partner with application teams to resolve specific security findings and improve overall system resilience.
Troubleshoot and debug networking issues, including deep understanding of networking layers, components, and configurations across cloud and hybrid environments .
Administer and optimize Linux-based operating systems , including troubleshooting , performance tuning, and implementing best practices for security and reliability.
Requirements :
6+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Engineering.
Deep knowledge of networking fundamentals, Linux operating systems, and CI / CD optimization strategies .
Very strong expertise in writing complex Terraform code , including advanced module design and best practices for large-scale, highly complex environments.
Proficiency in scripting or programming languages (e.g., Python, Bash, Go).
Hands-on experience with Azure cloud platform
Should be very strong in Basic networking concepts (OSI & TCP / IP Models, IP Addressing & Subnetting, DNS, HTTP & HTTPS, etc)
Linux OS and troubleshooting
Writing complex terraform code
Azure cloud and CI / CD concepts.
Bonus / Preferred Skills :
Experience with Docker and Kubernetes for containerization and orchestration.
Site Reliability Engineer • Baddi, Himachal Pradesh, India