Talent.com
Senior Site Reliability Engineer - Incident Management

Senior Site Reliability Engineer - Incident Management

Wits Innovation LabMohali
30+ days ago
Job description

Job Description : Sr. Site Reliability Engineer (SRE)

We are seeking an experienced and results-driven Sr. Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability, scalability, performance, and observability of our infrastructure and services.

This role requires strong expertise in cloud computing, Kubernetes, automation, monitoring, and incident management. The selected candidate will work closely with cross-functional teams to design and implement systems that are resilient, cost-effective, and efficient.

The ideal professional will have hands-on experience in designing and maintaining large-scale distributed systems and a proven track record in cloud-native operations. This position demands a proactive approach to automation, observability, disaster recovery, and incident response.

Key Responsibilities :

  • Reliability & Observability : Design, implement, and manage monitoring, logging, and alerting systems to improve visibility across environments. Utilize Prometheus, Grafana, ELK Stack, and distributed tracing tools to ensure system health.
  • Incident Management : Lead incident response efforts, participate in on-call rotations, resolve critical issues under pressure, and perform post-mortem analysis to improve future resilience.
  • Disaster Recovery & Scalability : Define and implement disaster recovery plans, conduct regular failover drills, and ensure infrastructure is designed for scalability and high availability.
  • Cloud Infrastructure Management : Operate and optimize environments hosted on AWS services including EC2, EKS, RDS, Cognito, and CloudWatch. Focus on cost-efficiency, reliability, and security.
  • Automation & Infrastructure as Code : Develop and maintain automation frameworks using Terraform or CloudFormation. Implement CI / CD and GitOps workflows with GitLab CI / CD to streamline deployments.
  • Kubernetes Administration : Manage production-grade Kubernetes clusters, perform upgrades, troubleshoot bottlenecks, and enforce best practices for high availability.
  • Database Operations : Administer PostgreSQL and similar databases, design replication strategies, ensure backup and recovery mechanisms, and monitor performance.
  • Networking & Security : Apply knowledge of networking protocols, load balancing, and security principles to protect and optimize infrastructure.
  • Cross-team Collaboration : Partner with development and QA teams to establish SLAs and SLOs for critical services, ensuring alignment of operational goals with business requirements.

Required Skills & Experience :

  • Minimum 4+ years of experience as an SRE, DevOps Engineer, or equivalent role.
  • Strong expertise with AWS services such as EC2, EKS, RDS, Cognito, and CloudWatch.
  • Proficiency in Kubernetes administration in production environments.
  • Hands-on experience with Infrastructure as Code Strong scripting and automation abilities using Python and Bash.
  • Proficiency with observability stacks : Prometheus, Grafana, ELK.
  • Experience in building and maintaining CI / CD pipelines with GitLab CI / CD and GitOps workflows.
  • Solid knowledge of PostgreSQL administration and replication.
  • Understanding of networking fundamentals, load balancing, and security best practices.
  • Ability to manage incident response and prioritize multiple issues effectively.
  • Preferred Qualifications :

  • Experience with configuration management tools such as Chef or Ansible.
  • Familiarity with monitoring and observability solutions such as Splunk, Datadog, or Dynatrace.
  • Exposure to distributed tracing systems for performance troubleshooting.
  • Certifications including AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or Certified
  • Kubernetes Administrator (CKA).

    (ref : hirist.tech)

    Create a job alert for this search

    Senior Site Reliability Engineer • Mohali

    Related jobs
    • Promoted
    Site Reliability Engineer - DevOps

    Site Reliability Engineer - DevOps

    Wits Innovation LabMohali
    Key Responsibilities : - Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environmentsShow moreLast updated: 30+ days ago
    • Promoted
    MLOps Engineer

    MLOps Engineer

    Capgeminibaddi, himachal pradesh, in
    Experience in developing MLOps framework cutting ML lifecycle : model development, training, evaluation, deployment, monitoring including Model Governance. Expert in Azure Databricks, Azure ML, Unity...Show moreLast updated: 16 days ago
    • Promoted
    Technical Lead

    Technical Lead

    Mphasispanchkula, haryana, in
    Looking for Senior Ingenium Developer with 10+ years' experience and following skills.Experience in Mainframe O / S and Development using COBOL programming language & JCL. Experience in development an...Show moreLast updated: 3 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    ConfidentialNagar, Sahibzada Ajit Singh Nagar, India
    SRE will lead the implementation and management of the observability stack across cloud infrastructure, ensuring reliability, scalability, performance, and cost-efficiency.The role spans across Kub...Show moreLast updated: 12 days ago
    • Promoted
    Diffusion Equipment Engineer

    Diffusion Equipment Engineer

    Orbit & SkylineMohali district, India, India
    Orbit & Skyline is looking forward to onboarding a.The candidate will be responsible for preventive and corrective maintenance of diffusion furnace equipment. The candidate must have good understand...Show moreLast updated: 19 days ago
    • Promoted
    Tech-Functional Business Analyst – Safety Systems (Argus, DLP, Case Processing)

    Tech-Functional Business Analyst – Safety Systems (Argus, DLP, Case Processing)

    vueverse.baddi, himachal pradesh, in
    Senior IT / Tech-Functional Business Analyst.Pharmacovigilance (PV) safety systems, particularly.This role focuses on system configuration, enhancements, integrations, validation, and ongoing technic...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaServicebaddi, himachal pradesh, in
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show moreLast updated: 16 days ago
    • Promoted
    Senior Site Reliability Engineer (C# / Python)

    Senior Site Reliability Engineer (C# / Python)

    Entechpanchkula, haryana, in
    Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show moreLast updated: 3 days ago
    • Promoted
    Senior ML Engineer

    Senior ML Engineer

    Piramal Financepanchkula, haryana, in
    Build and operate end-to-end ML / AI pipelines (data → training → deployment → monitoring).Automate CI / CD for ML / AI with Jenkins, integrate MLflow for tracking and registry.Deploy scalable batch and ...Show moreLast updated: 17 days ago
    • Promoted
    Technical Incident Manager (ITIL) - Offshore

    Technical Incident Manager (ITIL) - Offshore

    KPG99 INCbaddi, himachal pradesh, in
    Technical Incident Manager (ITIL).Technical Incident Manager provides operational support for the availability.Digital Banking service offerings across multiple cloud environments.Engages with the ...Show moreLast updated: 1 day ago
    • Promoted
    Senior DevOps Engineer (SRE)

    Senior DevOps Engineer (SRE)

    MightyBotpanchkula, haryana, in
    Title : Senior DevOps Engineer (SRE).Join our team as a Senior DevOps Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will build and maintain the ...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer - DevOps

    Site Reliability Engineer - DevOps

    ConfidentialNagar, Sahibzada Ajit Singh Nagar, India
    Design, implement, and maintain comprehensive monitoring, logging, and alerting solutions across our production and other environments. Lead incident response and post-mortem analyses, establishing ...Show moreLast updated: 22 days ago
    • Promoted
    DevSecOps / AppSecOps Staff Engineer

    DevSecOps / AppSecOps Staff Engineer

    First American (India)panchkula, haryana, in
    Our people-first culture empowers bold thinkers and passionate technologists to solve real-world challenges through scalable architecture and innovative design. If you're driven by impact, thrive in...Show moreLast updated: 30+ days ago
    • Promoted
    SRE (Devops)

    SRE (Devops)

    Cozzerabaddi, himachal pradesh, in
    Job Description : Senior SRE / DevOps Engineer.Night Shift (US East & West Coast Support).Excellent communication & collaboration skills. Manage and support production infrastructure during night shi...Show moreLast updated: 3 days ago
    • Promoted
    Full Stack Trainer

    Full Stack Trainer

    Chitkara UniversityRajpura, Punjab, India
    We are seeking an experienced and passionate React / React.Development Trainers to join our team on a Full-time basis.As a trainer, you will be responsible for delivering engaging and informative tra...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Yum! India Global Services Private LimitedMohali, India
    Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show moreLast updated: 4 hours ago
    • Promoted
    Senior Site Reliability Engineer - Cloud Infrastructure

    Senior Site Reliability Engineer - Cloud Infrastructure

    Wits Innovation LabMohali
    Site Reliability Engineer (SRE) Senior Role Location : Mohali Experience : 4+ years W...Show moreLast updated: 30+ days ago
    • Promoted
    Solutions Engineer - SRE - Remote

    Solutions Engineer - SRE - Remote

    datavrutibaddi, himachal pradesh, in
    Remote
    Role : Solutions Engineer (SRE / DevOps).A fast-growing AI-driven reliability engineering startup helping organizations reduce downtime by improving incident investigation, root-cause analysis, and ...Show moreLast updated: 1 day ago