This job offer is not available in your country.

Site Reliability Engineer - Incident Management

FxConsultingGurgaon

30+ days ago

Job description

Job Title : Site Reliability Engineer

Location : Gurgaon, India

Experience : 6 to 9 years

Employment Type : the Role :

We are seeking an experienced Site Reliability Engineer (SRE) to join our high-performance infrastructure and operations team. As an SRE, you will be responsible for ensuring the availability, scalability, performance, and reliability of our production systems. You will work closely with engineering, product, and platform teams to build robust monitoring systems, automate operational tasks, and drive incident management and root cause Responsibilities Incident & Alert Management :

Monitor production systems and handle alerts to ensure minimal service disruption.
Act as the first point of escalation for production incidents and critical system issues.
Drive rapid resolution of major incidents to restore services as quickly as possible.
Coordinate with cross-functional teams, vendors, and service providers to resolve unresolved incidents following defined escalation Monitoring & Observability :
Design, implement, and maintain application and infrastructure monitoring using tools such as OpenSearch, ELK, Grafana, Prometheus, PagerDuty, Pingdom, Datadog, and Splunk.
Ensure robust logging, metrics, and distributed tracing practices are in place to provide full observability into system performance.
Regularly review and refine monitoring configurations to align with evolving system needs.

3. Automation & Reliability Engineering :

Collaborate with product and platform engineering teams to develop SOPs (Standard Operating Procedures) for operational excellence.

Automate deployment, scaling, and operational tasks using tools like Ansible, Kubernetes, and CI / CD frameworks.

Implement proof-of-concepts (POCs) for new tools and technologies with the aim of integrating them into production Root Cause Analysis & Continuous Improvement :

Perform detailed root cause analysis for service-impacting events.

Identify trends and recurring issues to proactively improve system stability.

Contribute to post-incident reviews and recommend preventive Collaboration & Knowledge Sharing :

Work in a collaborative, Agile environment, actively participating in sprint planning, retrospectives, and technical discussions.

Seek expertise from domain specialists and share knowledge with peers.

Provide technical guidance to junior & Qualifications Skills Monitoring & Observability Tools : Hands-on experience with OpenSearch, ELK, Grafana, Prometheus, PagerDuty, Pingdom, Datadog, and Programming / Scripting : Proficiency in at least two of the following Python, Shell, Ansible (Golang is a Cloud & Infrastructure : Strong experience with AWS services, containerized applications, Kubernetes orchestration, and infrastructure CI / CD & Developer Tools : Experience with GitLab, Jenkins, and modern CI / CD System Architecture : Understanding of distributed systems, networking fundamentals, and high-availability Skills :

Strong problem-solving and analytical abilities.

Excellent communication and documentation skills.

Ability to work effectively in high-pressure situations and tight deadlines.

Strong organizational skills with the ability to manage multiple Qualifications :

Experience with large-scale, mission-critical production systems.

Familiarity with Agile methodologies and DevOps practices.

Prior experience driving POCs for production-scale technology adoption.

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Gurgaon

Related jobs

Promoted

Site Reliability Engineer-II

ConfidentialNoida, Delhi NCR

Build CICD stack collaborating across Dev and QA / Automation team and drive organization to new level of (daily / hourly) continuous delivery and deployment. Security is paramount to everything we do, ...Show moreLast updated: 6 days ago

Promoted

Site Reliability Engineer

XebiaGhaziabad, IN

AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 24 days ago

Promoted

Site Reliability Engineer

ConcordGhaziabad, IN

Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 16 days ago

Promoted

Senior Site Reliability Engineer

WSO2new delhi, delhi, in

Founded in 2005, WSO2 is the largest independent software vendor providing open-source API management, integration, and identity and access management (IAM) to thousands of enterprises in over 90 c...Show moreLast updated: 5 days ago

Promoted

Senior Site Reliability Engineer

ConfidentialDelhi, Mumbai, Kolkata

Build products with MVRs and reliability standards , ensuring system resilience and scalability.Set up and operate observability tools across multiple cloud providers, incorporating AI-powered anom...Show moreLast updated: 15 days ago

Promoted

Site Reliability Engineer

CorroHealthNoida, Uttar Pradesh, India

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team.The ideal candidate will have a deep understanding of both software engineering and systems administration, with a f...Show moreLast updated: 15 days ago

Promoted

Site Reliability Engineer

Amicon Hub Servicesdelhi, delhi, in

Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 3 days ago

Promoted

Command Center / Site Reliability Manager - Incident Management

Zyoin GroupGurgaon

We are seeking a strategic and operationally strong Command Center / Site Reliability Manager to lead our global incident response and network operations functions. This leadership role is responsib...Show moreLast updated: 15 days ago

Promoted

Xebia - Senior / Lead / Principal Site Reliability Engineer

Xebia IT Architects India Pvt LtdGurugram

Role : Site Reliability Engineer Experience Range : 7 - 12 Years Location : Pune & Chennai, Bangalore , Gurgaon Mode of Work : Hyb...Show moreLast updated: 30+ days ago

Promoted

L3 Server Engineer – Major Incident Management

Nextbridge IT SolutionsDelhi, IN

We are seeking an experienced L3 Infrastructure Engineer to join our IT Operations team with a focus on Major Incident Management (MIM), incident request management, and rapid response for Priority...Show moreLast updated: 5 days ago

Promoted

Site Reliability Engineer - Incident Management

FxConsultingGurugram

Job Title : Site Reliability Engineer Location : Gurgaon, India Experience : 6 to 9 years Show moreLast updated: 30+ days ago

Promoted

RELX - Site Reliability Engineer - IAC Terraform

REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Gurugram

Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 16 days ago

Promoted

Site Reliability Engineer - CI / CD

hirezy.aiDelhi, IN

Remote

Technical Skills : - Programming : Proficiency in languages like Python, Bash, or Java is essential.Operating Systems : Deep understanding of Linux / Windows operating ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer - Chaos Management

Xebianoida, delhi, in

Promoted

Gemini Solutions - Site Reliability Engineer - Cloud Solutions

Gemini Solutions Private LimitedGurgaon

Position Summary : In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliab...Show moreLast updated: 19 days ago

Promoted

Site Reliability Engineer - Azure / Cloud Services

Leapwork India Private LimitedGurugram

At Leapwork, our vision is to break down the barriers between humans and computers through the worlds most accessible automation platform. We are the leading global AI-powered visual test automation...Show moreLast updated: 15 days ago

Promoted

Lead Site Reliability Engineer

ConfidentialGurgaon / Gurugram

We are looking for a skilled Snowflake Developer with 8+ years of experience in developing and managing data warehouse solutions using Snowflake. The ideal candidate should have expertise in stored ...Show moreLast updated: 16 days ago

Promoted

Site Reliability Engineer

UplersGhaziabad, IN

Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 22 days ago

Promoted

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.Ghaziabad, IN

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer / Lead Site Reliability Engineer

ConfidentialNoida, India

BOLD is seeking professionals who will be responsible for performing the build and release activities with Microsoft Technology stack. This person will also manage CI / CD pipelines and automate the b...Show moreLast updated: 7 days ago