Senior Site Reliability Engineer - Incident ManagementWits Innovation Lab • Mohali

Senior Site Reliability Engineer - Incident Management

Wits Innovation Lab • Mohali

30+ days ago

Job description

Job Description : Sr. Site Reliability Engineer (SRE)

We are seeking an experienced and results-driven Sr. Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability, scalability, performance, and observability of our infrastructure and services.

This role requires strong expertise in cloud computing, Kubernetes, automation, monitoring, and incident management. The selected candidate will work closely with cross-functional teams to design and implement systems that are resilient, cost-effective, and efficient.

The ideal professional will have hands-on experience in designing and maintaining large-scale distributed systems and a proven track record in cloud-native operations. This position demands a proactive approach to automation, observability, disaster recovery, and incident response.

Key Responsibilities :

Reliability & Observability : Design, implement, and manage monitoring, logging, and alerting systems to improve visibility across environments. Utilize Prometheus, Grafana, ELK Stack, and distributed tracing tools to ensure system health.
Incident Management : Lead incident response efforts, participate in on-call rotations, resolve critical issues under pressure, and perform post-mortem analysis to improve future resilience.
Disaster Recovery & Scalability : Define and implement disaster recovery plans, conduct regular failover drills, and ensure infrastructure is designed for scalability and high availability.
Cloud Infrastructure Management : Operate and optimize environments hosted on AWS services including EC2, EKS, RDS, Cognito, and CloudWatch. Focus on cost-efficiency, reliability, and security.
Automation & Infrastructure as Code : Develop and maintain automation frameworks using Terraform or CloudFormation. Implement CI / CD and GitOps workflows with GitLab CI / CD to streamline deployments.
Kubernetes Administration : Manage production-grade Kubernetes clusters, perform upgrades, troubleshoot bottlenecks, and enforce best practices for high availability.
Database Operations : Administer PostgreSQL and similar databases, design replication strategies, ensure backup and recovery mechanisms, and monitor performance.
Networking & Security : Apply knowledge of networking protocols, load balancing, and security principles to protect and optimize infrastructure.
Cross-team Collaboration : Partner with development and QA teams to establish SLAs and SLOs for critical services, ensuring alignment of operational goals with business requirements.

Required Skills & Experience :

Minimum 4+ years of experience as an SRE, DevOps Engineer, or equivalent role.

Strong expertise with AWS services such as EC2, EKS, RDS, Cognito, and CloudWatch.

Proficiency in Kubernetes administration in production environments.

Hands-on experience with Infrastructure as Code Strong scripting and automation abilities using Python and Bash.

Proficiency with observability stacks : Prometheus, Grafana, ELK.

Experience in building and maintaining CI / CD pipelines with GitLab CI / CD and GitOps workflows.

Solid knowledge of PostgreSQL administration and replication.

Understanding of networking fundamentals, load balancing, and security best practices.

Ability to manage incident response and prioritize multiple issues effectively.

Preferred Qualifications :

Experience with configuration management tools such as Chef or Ansible.

Familiarity with monitoring and observability solutions such as Splunk, Datadog, or Dynatrace.

Exposure to distributed tracing systems for performance troubleshooting.

Certifications including AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Certified

Kubernetes Administrator (CKA).

(ref : hirist.tech)

Create a job alert for this search

Senior Site Reliability Engineer • Mohali

Related jobs

Site Reliability Engineer

Infosys Finacle • baddi, himachal pradesh, in

Role : DevSecOps Developer – Secure Coding & Automation.Strong scripting skills in Python, Shell, or similar languages for automation and tooling. Should be able to design, develop, test, and deploy...Show more

Last updated: 15 hours ago • Promoted • New!

Site Reliability Engineer - DevOps

Wits Innovation Lab • Mohali

Key Responsibilities : - Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environmentsShow more

Last updated: 30+ days ago • Promoted

MLOps Engineer

Capgemini • baddi, himachal pradesh, in

Experience in developing MLOps framework cutting ML lifecycle : model development, training, evaluation, deployment, monitoring including Model Governance. Expert in Azure Databricks, Azure ML, Unity...Show more

Last updated: 17 days ago • Promoted

Technical Lead

Mphasis • panchkula, haryana, in

Looking for Senior Ingenium Developer with 10+ years' experience and following skills.Experience in Mainframe O / S and Development using COBOL programming language & JCL. Experience in development an...Show more

Last updated: 4 days ago • Promoted

Senior Site Reliability Engineer

Confidential • Nagar, Sahibzada Ajit Singh Nagar, India

SRE will lead the implementation and management of the observability stack across cloud infrastructure, ensuring reliability, scalability, performance, and cost-efficiency.The role spans across Kub...Show more

Last updated: 13 days ago • Promoted

Diffusion Equipment Engineer

Orbit & Skyline • Mohali district, India, India

Orbit & Skyline is looking forward to onboarding a.The candidate will be responsible for preventive and corrective maintenance of diffusion furnace equipment. The candidate must have good understand...Show more

Last updated: 19 days ago • Promoted

Site Reliability Engineer (SRE) – Infrastructure & Automation

InstaService • baddi, himachal pradesh, in

InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more

Last updated: 17 days ago • Promoted

Senior Site Reliability Engineer (C# / Python)

Entech • panchkula, haryana, in

Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show more

Last updated: 4 days ago • Promoted

Senior ML Engineer

Piramal Finance • panchkula, haryana, in

Build and operate end-to-end ML / AI pipelines (data → training → deployment → monitoring).Automate CI / CD for ML / AI with Jenkins, integrate MLflow for tracking and registry.Deploy scalable batch and ...Show more

Last updated: 18 days ago • Promoted

Tech-Functional Business Analyst – Safety Systems (Argus, DLP, Case Processing)

vueverse. • panchkula, haryana, in

Senior IT / Tech-Functional Business Analyst.Pharmacovigilance (PV) safety systems, particularly.This role focuses on system configuration, enhancements, integrations, validation, and ongoing technic...Show more

Last updated: 2 days ago • Promoted

Senior DevOps Engineer (SRE)

MightyBot • panchkula, haryana, in

Title : Senior DevOps Engineer (SRE).Join our team as a Senior DevOps Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will build and maintain the ...Show more

Last updated: 10 days ago • Promoted

Design Engineer - Plumbing (Hospitals)

WSP in India • baddi, himachal pradesh, in

The role involves raising the team's technical competence by fostering continuous learning and keeping skills aligned with the latest industry practices. This includes implementing robust delivery a...Show more

Last updated: 3 days ago • Promoted

Senior ML Engineer

Torinit • baddi, himachal pradesh, in

Canadian-based digital consulting company.At Torinit, we don't just serve our clients; we work with them to create transformative digital journeys by leveraging the latest technologies and world-cl...Show more

Last updated: 2 days ago • Promoted

Senior Design Engineer / Lead Design Engineer (ARM-based SoC)

eInfochips (An Arrow Company) • panchkula, haryana, in

Hiring : Senior Design Engineer / Lead Design Engineer (ARM-based SoC).Preferred Location : BLR / HYD / PUNE / NOIDA / AHM / CHENNAI ( Willing to work in US Time Zone). We are looking for an experienced.ARM-bas...Show more

Last updated: 3 days ago • Promoted

Site Reliability Engineer - DevOps

Confidential • Nagar, Sahibzada Ajit Singh Nagar, India

Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environments. Lead incident response and post-mortem analyses, establishing ...Show more

Last updated: 23 days ago • Promoted

DevSecOps / AppSecOps Staff Engineer

First American (India) • panchkula, haryana, in

Our people-first culture empowers bold thinkers and passionate technologists to solve real-world challenges through scalable architecture and innovative design. If you're driven by impact, thrive in...Show more

Last updated: 30+ days ago • Promoted

Full Stack Trainer

Chitkara University • Rajpura, Punjab, India

We are seeking an experienced and passionate React / React.Development Trainers to join our team on a Full-time basis.As a trainer, you will be responsible for delivering engaging and informative tra...Show more

Last updated: 30+ days ago • Promoted

Senior Site Reliability Engineer - Cloud Infrastructure

Wits Innovation Lab • Mohali

Site Reliability Engineer (SRE) Senior Role Location : Mohali Experience : 4+ years W...Show more

Last updated: 30+ days ago • Promoted