We are seeking a highly skilled Site Reliability Engineer (SRE) / DevOps Engineer with a strong background in cloud infrastructure, automation, and large-scale system operations. In this role, you will partner across engineering teams to enhance platform reliability, accelerate delivery, and ensure a world-class customer experience.
Key Responsibilities
Drive initiatives that enhance operational efficiency, scalability, and overall platform reliability.
Lead standardization efforts across services and disciplines in collaboration with embedded SRE teams.
Identify and implement automation opportunities for deployment, infrastructure management, and observability.
Apply modern security practices to ensure secure cloud-based infrastructure and software systems.
Perform full-stack diagnostics to determine and resolve root cause issues.
Analyze system performance and drive improvements to key operational metrics and KPIs.
Proactively assess infrastructure and applications for enhancements rather than waiting for direction.
Safeguard application data from unauthorized access, modification, or disclosure.
Build and maintain high-availability, redundant systems and disaster recovery procedures.
Develop integrated workflows to support internal support teams and cross-functional partners.
Own the customer experience—ensure seamless digital interactions and promote user satisfaction.
Respond to incidents promptly and support troubleshooting efforts across the stack.
Required Skills & Competencies
Cloud & Infrastructure
1+ years working with Infrastructure as Code (IaC) and DSC tools : Terraform, CDK, Chef.
1+ years deploying and managing containerized workloads with Docker and Kubernetes .
1+ years managing AWS infrastructure at scale : EC2, S3, ELB, Lambda, Route 53, ECS, SQS, CloudWatch.
Prior experience in a DevOps or SRE environment .
Automation & Scripting
Strong automation background with scripting languages including PowerShell, Ruby, Go, Python, Bash .
Monitoring & Troubleshooting
Experience with large-scale monitoring and APM tools : ELK Stack, Dynatrace, New Relic, Nagios .
Skilled in IIS management, troubleshooting, and performance monitoring .
Experience supporting web farms in high-traffic SaaS environments.
Strong analytical, diagnostic, and problem-solving skills with focus on proactive improvement.
Application & CI / CD
Extensive experience with .NET application architecture (caching, CDNs, load balancing, HA).
Clear understanding of SDLC processes and hands-on experience with CI / CD tools such as
TeamCity, Octopus Deploy, GitHub, Jenkins, Codefresh .
Additional Technologies (Preferred)
Active Directory, SSL, FTP, Big-IP F5
T-SQL, MongoDB, MySQL, SQL Server
Git, Chef, Salt
Kafka
Linux / Windows Server Administration
Apache, Bash
Engineer • Mohali, Punjab, India