Talent.com
Senior Site Reliability Engineer - Incident Management
Senior Site Reliability Engineer - Incident ManagementWits Innovation Lab • Mohali
Senior Site Reliability Engineer - Incident Management

Senior Site Reliability Engineer - Incident Management

Wits Innovation Lab • Mohali
30+ days ago
Job description

Job Description : Sr. Site Reliability Engineer (SRE)

We are seeking an experienced and results-driven Sr. Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability, scalability, performance, and observability of our infrastructure and services.

This role requires strong expertise in cloud computing, Kubernetes, automation, monitoring, and incident management. The selected candidate will work closely with cross-functional teams to design and implement systems that are resilient, cost-effective, and efficient.

The ideal professional will have hands-on experience in designing and maintaining large-scale distributed systems and a proven track record in cloud-native operations. This position demands a proactive approach to automation, observability, disaster recovery, and incident response.

Key Responsibilities :

  • Reliability & Observability : Design, implement, and manage monitoring, logging, and alerting systems to improve visibility across environments. Utilize Prometheus, Grafana, ELK Stack, and distributed tracing tools to ensure system health.
  • Incident Management : Lead incident response efforts, participate in on-call rotations, resolve critical issues under pressure, and perform post-mortem analysis to improve future resilience.
  • Disaster Recovery & Scalability : Define and implement disaster recovery plans, conduct regular failover drills, and ensure infrastructure is designed for scalability and high availability.
  • Cloud Infrastructure Management : Operate and optimize environments hosted on AWS services including EC2, EKS, RDS, Cognito, and CloudWatch. Focus on cost-efficiency, reliability, and security.
  • Automation & Infrastructure as Code : Develop and maintain automation frameworks using Terraform or CloudFormation. Implement CI / CD and GitOps workflows with GitLab CI / CD to streamline deployments.
  • Kubernetes Administration : Manage production-grade Kubernetes clusters, perform upgrades, troubleshoot bottlenecks, and enforce best practices for high availability.
  • Database Operations : Administer PostgreSQL and similar databases, design replication strategies, ensure backup and recovery mechanisms, and monitor performance.
  • Networking & Security : Apply knowledge of networking protocols, load balancing, and security principles to protect and optimize infrastructure.
  • Cross-team Collaboration : Partner with development and QA teams to establish SLAs and SLOs for critical services, ensuring alignment of operational goals with business requirements.

Required Skills & Experience :

  • Minimum 4+ years of experience as an SRE, DevOps Engineer, or equivalent role.
  • Strong expertise with AWS services such as EC2, EKS, RDS, Cognito, and CloudWatch.
  • Proficiency in Kubernetes administration in production environments.
  • Hands-on experience with Infrastructure as Code Strong scripting and automation abilities using Python and Bash.
  • Proficiency with observability stacks : Prometheus, Grafana, ELK.
  • Experience in building and maintaining CI / CD pipelines with GitLab CI / CD and GitOps workflows.
  • Solid knowledge of PostgreSQL administration and replication.
  • Understanding of networking fundamentals, load balancing, and security best practices.
  • Ability to manage incident response and prioritize multiple issues effectively.
  • Preferred Qualifications :

  • Experience with configuration management tools such as Chef or Ansible.
  • Familiarity with monitoring and observability solutions such as Splunk, Datadog, or Dynatrace.
  • Exposure to distributed tracing systems for performance troubleshooting.
  • Certifications including AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Certified
  • Kubernetes Administrator (CKA).

    (ref : hirist.tech)

    Create a job alert for this search

    Senior Site Reliability Engineer • Mohali

    Related jobs
    Site Reliability Engineer

    Site Reliability Engineer

    Infosys Finacle • baddi, himachal pradesh, in
    Role : DevSecOps Developer – Secure Coding & Automation.Strong scripting skills in Python, Shell, or similar languages for automation and tooling. Should be able to design, develop, test, and deploy...Show more
    Last updated: 15 hours ago • Promoted • New!
    Site Reliability Engineer - DevOps

    Site Reliability Engineer - DevOps

    Wits Innovation Lab • Mohali
    Key Responsibilities : - Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environmentsShow more
    Last updated: 30+ days ago • Promoted
    MLOps Engineer

    MLOps Engineer

    Capgemini • baddi, himachal pradesh, in
    Experience in developing MLOps framework cutting ML lifecycle : model development, training, evaluation, deployment, monitoring including Model Governance. Expert in Azure Databricks, Azure ML, Unity...Show more
    Last updated: 17 days ago • Promoted
    Technical Lead

    Technical Lead

    Mphasis • panchkula, haryana, in
    Looking for Senior Ingenium Developer with 10+ years' experience and following skills.Experience in Mainframe O / S and Development using COBOL programming language & JCL. Experience in development an...Show more
    Last updated: 4 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Confidential • Nagar, Sahibzada Ajit Singh Nagar, India
    SRE will lead the implementation and management of the observability stack across cloud infrastructure, ensuring reliability, scalability, performance, and cost-efficiency.The role spans across Kub...Show more
    Last updated: 13 days ago • Promoted
    Diffusion Equipment Engineer

    Diffusion Equipment Engineer

    Orbit & Skyline • Mohali district, India, India
    Orbit & Skyline is looking forward to onboarding a.The candidate will be responsible for preventive and corrective maintenance of diffusion furnace equipment. The candidate must have good understand...Show more
    Last updated: 19 days ago • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaService • baddi, himachal pradesh, in
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 17 days ago • Promoted
    Senior Site Reliability Engineer (C# / Python)

    Senior Site Reliability Engineer (C# / Python)

    Entech • panchkula, haryana, in
    Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show more
    Last updated: 4 days ago • Promoted
    Senior ML Engineer

    Senior ML Engineer

    Piramal Finance • panchkula, haryana, in
    Build and operate end-to-end ML / AI pipelines (data → training → deployment → monitoring).Automate CI / CD for ML / AI with Jenkins, integrate MLflow for tracking and registry.Deploy scalable batch and ...Show more
    Last updated: 18 days ago • Promoted
    Tech-Functional Business Analyst – Safety Systems (Argus, DLP, Case Processing)

    Tech-Functional Business Analyst – Safety Systems (Argus, DLP, Case Processing)

    vueverse. • panchkula, haryana, in
    Senior IT / Tech-Functional Business Analyst.Pharmacovigilance (PV) safety systems, particularly.This role focuses on system configuration, enhancements, integrations, validation, and ongoing technic...Show more
    Last updated: 2 days ago • Promoted
    Senior DevOps Engineer (SRE)

    Senior DevOps Engineer (SRE)

    MightyBot • panchkula, haryana, in
    Title : Senior DevOps Engineer (SRE).Join our team as a Senior DevOps Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will build and maintain the ...Show more
    Last updated: 10 days ago • Promoted
    Design Engineer - Plumbing (Hospitals)

    Design Engineer - Plumbing (Hospitals)

    WSP in India • baddi, himachal pradesh, in
    The role involves raising the team's technical competence by fostering continuous learning and keeping skills aligned with the latest industry practices. This includes implementing robust delivery a...Show more
    Last updated: 3 days ago • Promoted
    Senior ML Engineer

    Senior ML Engineer

    Torinit • baddi, himachal pradesh, in
    Canadian-based digital consulting company.At Torinit, we don't just serve our clients; we work with them to create transformative digital journeys by leveraging the latest technologies and world-cl...Show more
    Last updated: 2 days ago • Promoted
    Senior Design Engineer / Lead Design Engineer (ARM-based SoC)

    Senior Design Engineer / Lead Design Engineer (ARM-based SoC)

    eInfochips (An Arrow Company) • panchkula, haryana, in
    Hiring : Senior Design Engineer / Lead Design Engineer (ARM-based SoC).Preferred Location : BLR / HYD / PUNE / NOIDA / AHM / CHENNAI ( Willing to work in US Time Zone). We are looking for an experienced.ARM-bas...Show more
    Last updated: 3 days ago • Promoted
    Site Reliability Engineer - DevOps

    Site Reliability Engineer - DevOps

    Confidential • Nagar, Sahibzada Ajit Singh Nagar, India
    Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environments. Lead incident response and post-mortem analyses, establishing ...Show more
    Last updated: 23 days ago • Promoted
    DevSecOps / AppSecOps Staff Engineer

    DevSecOps / AppSecOps Staff Engineer

    First American (India) • panchkula, haryana, in
    Our people-first culture empowers bold thinkers and passionate technologists to solve real-world challenges through scalable architecture and innovative design. If you're driven by impact, thrive in...Show more
    Last updated: 30+ days ago • Promoted
    Full Stack Trainer

    Full Stack Trainer

    Chitkara University • Rajpura, Punjab, India
    We are seeking an experienced and passionate React / React.Development Trainers to join our team on a Full-time basis.As a trainer, you will be responsible for delivering engaging and informative tra...Show more
    Last updated: 30+ days ago • Promoted
    Senior Site Reliability Engineer - Cloud Infrastructure

    Senior Site Reliability Engineer - Cloud Infrastructure

    Wits Innovation Lab • Mohali
    Site Reliability Engineer (SRE) Senior Role Location : Mohali Experience : 4+ years W...Show more
    Last updated: 30+ days ago • Promoted