Senior Site Reliability Engineer
Key Highlights
️ Build, scale, and optimize cloud-native infrastructure powering global, high-availability platforms
⚡ Drive automation-first engineering across AWS, Terraform, CI / CD, observability, and resilient systems
Own reliability, uptime, system health, costs, and performance across mission-critical environments
Strengthen DevSecOps practices—improving security, delivery velocity, and operational excellence
Lead major incident response, troubleshoot complex issues, and uphold production stability at scale
Position Overview
We are seeking a
Senior Site Reliability Engineer
to drive reliability, automation, and performance for large-scale, cloud-based platforms. This role blends deep technical engineering, systems thinking, DevOps collaboration, and operational leadership.
You will design and implement scalable infrastructure, improve observability, enhance resiliency, manage incident operations, and champion modern DevSecOps practices. This role plays a critical part in supporting tens of thousands of daily users while ensuring platforms remain secure, fast, and highly available.
Key Responsibilities
Cloud Engineering
Architect, deploy, and optimize AWS environments using automation and Infrastructure-as-Code
Build tooling that increases predictability, stability, and delivery speed
Optimize systems for scale, reliability, cost, and performance
Maintain repeatable, traceable, and transparent infrastructure through Terraform and automation
Monitor cloud spend and usage, ensuring alignment with service-level objectives
Observability & Reliability
Own uptime, reliability, system security, performance metrics, and golden signals
Lead incident management and triage bridges during major events
Enhance telemetry systems (NewRelic, CloudWatch, DataDog) for deep operational visibility
Use data-driven analysis to improve system stability and customer experience
Ensure architecture and deployment patterns meet SLAs and reliability goals
DevSecOps & Automation
Strengthen CI / CD pipelines, code-review practices, and engineering standards
Partner with Cybersecurity to address vulnerabilities through automation
Support secure, consistent, and scalable delivery workflows across engineering teams
Resiliency Engineering
Identify failure points, blast-radius risks, and architectural gaps
Run failure-injection / chaos testing to validate resiliency
Forecast traffic, plan for seasonal peaks, and scale systems for 2x+ load scenarios
Drive improvements to infrastructure and software to meet resiliency targets
Leadership & Collaboration
Mentor engineers across levels, promoting high-quality engineering practices
Collaborate daily with product, engineering, and security teams in a DevOps model
Document, uplift, and share knowledge through cross-team forums and best practices
Qualifications
Experience as a software engineer with strong debugging + deployment skills
Hands-on expertise with AWS and Terraform (required)
Experience with ECS, and Kubernetes / EKS experience strongly preferred
Strong proficiency in Python, Golang, Bash, and automation frameworks
CI / CD experience with Jenkins, GitHub Enterprise, CircleCI, or similar
Ability to troubleshoot across web servers, app servers, OS, networks, storage, and databases
Experience running large-scale, high-availability production systems
Strong communication, root-cause analysis, and incident leadership skills
BS in Computer Science or equivalent industry experience
About Us
We build scalable, secure, and high-performing digital platforms that power global user experiences. By combining cloud engineering, automation, observability, and resilient systems design, we help organizations operate more reliably, innovate faster, and support long-term platform stability and growth.
Why Join Us
Join a forward-thinking engineering organization where reliability, automation, and performance are core values. You’ll work with a modern cloud stack, collaborate with exceptional engineers, and own meaningful technical impact across large-scale applications. This is an opportunity to shape infrastructure strategy, elevate engineering practices, and build systems that support millions with consistency and excellence.
Senior Site Reliability Engineer • Delhi, India