Talent.com
This job offer is not available in your country.
Site Reliability Engineer - Incident Management

Site Reliability Engineer - Incident Management

FxConsultingGurgaon
30+ days ago
Job description

Job Title : Site Reliability Engineer

Location : Gurgaon, India

Experience : 6 to 9 years

Employment Type : the Role :

We are seeking an experienced Site Reliability Engineer (SRE) to join our high-performance infrastructure and operations team. As an SRE, you will be responsible for ensuring the availability, scalability, performance, and reliability of our production systems. You will work closely with engineering, product, and platform teams to build robust monitoring systems, automate operational tasks, and drive incident management and root cause Responsibilities Incident & Alert Management :

  • Monitor production systems and handle alerts to ensure minimal service disruption.
  • Act as the first point of escalation for production incidents and critical system issues.
  • Drive rapid resolution of major incidents to restore services as quickly as possible.
  • Coordinate with cross-functional teams, vendors, and service providers to resolve unresolved incidents following defined escalation Monitoring & Observability :
  • Design, implement, and maintain application and infrastructure monitoring using tools such as OpenSearch, ELK, Grafana, Prometheus, PagerDuty, Pingdom, Datadog, and Splunk.
  • Ensure robust logging, metrics, and distributed tracing practices are in place to provide full observability into system performance.
  • Regularly review and refine monitoring configurations to align with evolving system needs.

3. Automation & Reliability Engineering :

  • Collaborate with product and platform engineering teams to develop SOPs (Standard Operating Procedures) for operational excellence.
  • Automate deployment, scaling, and operational tasks using tools like Ansible, Kubernetes, and CI / CD frameworks.
  • Implement proof-of-concepts (POCs) for new tools and technologies with the aim of integrating them into production Root Cause Analysis & Continuous Improvement :
  • Perform detailed root cause analysis for service-impacting events.
  • Identify trends and recurring issues to proactively improve system stability.
  • Contribute to post-incident reviews and recommend preventive Collaboration & Knowledge Sharing :
  • Work in a collaborative, Agile environment, actively participating in sprint planning, retrospectives, and technical discussions.
  • Seek expertise from domain specialists and share knowledge with peers.
  • Provide technical guidance to junior & Qualifications Skills Monitoring & Observability Tools : Hands-on experience with OpenSearch, ELK, Grafana, Prometheus, PagerDuty, Pingdom, Datadog, and Programming / Scripting : Proficiency in at least two of the following Python, Shell, Ansible (Golang is a Cloud & Infrastructure : Strong experience with AWS services, containerized applications, Kubernetes orchestration, and infrastructure CI / CD & Developer Tools : Experience with GitLab, Jenkins, and modern CI / CD System Architecture : Understanding of distributed systems, networking fundamentals, and high-availability Skills :
  • Strong problem-solving and analytical abilities.
  • Excellent communication and documentation skills.
  • Ability to work effectively in high-pressure situations and tight deadlines.
  • Strong organizational skills with the ability to manage multiple Qualifications :
  • Experience with large-scale, mission-critical production systems.
  • Familiarity with Agile methodologies and DevOps practices.
  • Prior experience driving POCs for production-scale technology adoption.
  • (ref : hirist.tech)

    Create a job alert for this search

    Site Reliability Engineer • Gurgaon

    Related jobs
    • Promoted
    Site Reliability Engineer-II

    Site Reliability Engineer-II

    ConfidentialNoida, Delhi NCR
    Build CICD stack collaborating across Dev and QA / Automation team and drive organization to new level of (daily / hourly) continuous delivery and deployment. Security is paramount to everything we do, ...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    XebiaGhaziabad, IN
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConcordGhaziabad, IN
    Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 16 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    WSO2new delhi, delhi, in
    Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 5 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    ConfidentialDelhi, Mumbai, Kolkata
    Build products with MVRs and reliability standards , ensuring system resilience and scalability.Set up and operate observability tools across multiple cloud providers, incorporating AI-powered anom...Show moreLast updated: 15 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CorroHealthNoida, Uttar Pradesh, India
    We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team.The ideal candidate will have a deep understanding of both software engineering and systems administration, with a f...Show moreLast updated: 15 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Amicon Hub Servicesdelhi, delhi, in
    Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 3 days ago
    • Promoted
    Command Center / Site Reliability Manager - Incident Management

    Command Center / Site Reliability Manager - Incident Management

    Zyoin GroupGurgaon
    We are seeking a strategic and operationally strong Command Center / Site Reliability Manager to lead our global incident response and network operations functions. This leadership role is responsib...Show moreLast updated: 15 days ago
    • Promoted
    Xebia - Senior / Lead / Principal Site Reliability Engineer

    Xebia - Senior / Lead / Principal Site Reliability Engineer

    Xebia IT Architects India Pvt LtdGurugram
    Role : Site Reliability Engineer Experience Range : 7 - 12 Years Location : Pune & Chennai, Bangalore , Gurgaon Mode of Work : Hyb...Show moreLast updated: 30+ days ago
    • Promoted
    L3 Server Engineer – Major Incident Management

    L3 Server Engineer – Major Incident Management

    Nextbridge IT SolutionsDelhi, IN
    We are seeking an experienced L3 Infrastructure Engineer to join our IT Operations team with a focus on Major Incident Management (MIM), incident request management, and rapid response for Priority...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer - Incident Management

    Site Reliability Engineer - Incident Management

    FxConsultingGurugram
    Job Title : Site Reliability Engineer Location : Gurgaon, India Experience : 6 to 9 years Show moreLast updated: 30+ days ago
    • Promoted
    RELX - Site Reliability Engineer - IAC Terraform

    RELX - Site Reliability Engineer - IAC Terraform

    REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Gurugram
    Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer - CI / CD

    Site Reliability Engineer - CI / CD

    hirezy.aiDelhi, IN
    Remote
    Technical Skills : - Programming : Proficiency in languages like Python, Bash, or Java is essential.Operating Systems : Deep understanding of Linux / Windows operating ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebianoida, delhi, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 5 days ago
    • Promoted
    Gemini Solutions - Site Reliability Engineer - Cloud Solutions

    Gemini Solutions - Site Reliability Engineer - Cloud Solutions

    Gemini Solutions Private LimitedGurgaon
    Position Summary : In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliab...Show moreLast updated: 19 days ago
    • Promoted
    Site Reliability Engineer - Azure / Cloud Services

    Site Reliability Engineer - Azure / Cloud Services

    Leapwork India Private LimitedGurugram
    At Leapwork, our vision is to break down the barriers between humans and computers through the worlds most accessible automation platform. We are the leading global AI-powered visual test automation...Show moreLast updated: 15 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    ConfidentialGurgaon / Gurugram
    We are looking for a skilled Snowflake Developer with 8+ years of experience in developing and managing data warehouse solutions using Snowflake. The ideal candidate should have expertise in stored ...Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    UplersGhaziabad, IN
    Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 22 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Ghaziabad, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer / Lead Site Reliability Engineer

    Site Reliability Engineer / Lead Site Reliability Engineer

    ConfidentialNoida, India
    BOLD is seeking professionals who will be responsible for performing the build and release activities with Microsoft Technology stack. This person will also manage CI / CD pipelines and automate the b...Show moreLast updated: 7 days ago