This job offer is not available in your country.

Site Reliability Engineer

iTheme Consulting Pvt LtdDelhi, IN

15 hours ago

Job type

Remote

Job description

Site Reliability Engineer (SRE)

We are hiring a Site Reliability Engineer (SRE) to support the night-time operations of a mission-critical banking platform for a US-based enterprise client. This is a permanent night shift role tailored for experienced engineers who thrive in production environments and bring a proactive approach to incident resolution and automation.

You will work on system monitoring, incident response, and platform stability-while also improving observability, creating automation scripts, and collaborating with developers and DevOps teams. You wont just respond to alerts-youll help prevent them.

Work Mode : Permanent Night Shift

Note : This is a fixed night shift role. Candidates must have prior experience or explicitly confirm readiness for permanent US-time zone shifts.

Key Responsibilities :

Monitor system health, SLIs / SLOs, and infrastructure using tools like Prometheus, Grafana, ELK, Stackdriver, etc.
Lead incident triage for P1 / P2 alerts, engage in war rooms, update tickets (JIRA / SNOW), and participate in post-incident RCA documentation.
Create or enhance automation scripts (Bash / Python) for log ingestion, alert suppression, auto-recovery, and health checks.
Analyze application runtime issues-such as JVM logs, memory usage, GC pauses, or thread deadlocks-to support root cause analysis.
Participate in daily DevOps / SRE standups, collaborating closely with engineering teams to improve production reliability.
Handle database performance alerts (Oracle / Postgres) and collaborate with DBAs or developers to resolve backend bottlenecks.
Track and interpret SLO breaches, availability metrics, and system latencies to enforce production SLAs.

Core Skills & Expertise : Technical Skills :

Experience with Grafana, Prometheus, ELK Stack, or Stackdriver. Able to define alerts, read logs, and correlate cross-system issues.

Full ownership of P1 / P2 incidents - including triage, ticketing, stakeholder communication, and RCA participation.

Proficient in Bash or Python scripting to automate routine SRE tasks and recovery workflows.

Experience managing production workloads on GCP, AWS, or Azure, with ability to inspect cloud logs, VM status, networking, and storage configurations.

Familiar with concepts like error budgets, latency thresholds, and SLO tracking. Capable of interpreting breaches and reporting anomalies.

Able to spot symptoms of JVM issues like GC pauses, memory leaks, thread contention, and raise appropriate diagnostics.

Identify backend delays or errors from logs and assist in pinpointing query or connection-related issues.

Strong communication skills to work with distributed teams during escalations, code fixes, or configuration changes.

Must be fully aligned to a permanent night shift (US time) and self-sufficient in a remote-first environment.

Nice-to-Have Skills :

Familiarity with ServiceNow, change advisory boards, rollback planning, and structured release processes.

Experience monitoring CPU, memory, and traffic metrics to recommend infrastructure scale-up / down strategies.

Exposure to embedding SRE gates, smoke tests, or health validations in CI pipelines like Jenkins or GitHub Actions.

Basic understanding of tools like SLO Generator or Datadog for automated budget tracking and alerting.

Can interpret Terraform code related to monitoring, infrastructure, or alert rules. Not required to author full modules.

Holding a GCP Associate Cloud Engineer or similar certification is a plus but not mandatory.

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Delhi, IN

Related jobs

Promoted

Site Reliability Engineer - Azure / Cloud Services

Leapwork India Private LimitedGurgaon

At Leapwork, our vision is to break down the barriers between humans and computers through the worlds most accessible automation platform. We are the leading global AI-powered visual test automation...Show moreLast updated: 18 days ago

Promoted

Site Reliability Engineer

ConcordGhaziabad, IN

Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 18 days ago

Promoted

Site Reliability Engineer

XebiaDelhi, IN

AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 27 days ago

Promoted

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.Delhi, IN

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Amicon Hub Servicesnoida, delhi, in

Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 6 days ago

Promoted

Site Reliability Engineer

CorroHealthNoida, Uttar Pradesh, India

We are seeking a highly skilled Site Reliability Engineer (SRE) to join our team.The ideal candidate will have a deep understanding of both software engineering and systems administration, with a f...Show moreLast updated: 19 days ago

Promoted

Xebia - Senior / Lead / Principal Site Reliability Engineer

Xebia IT Architects India Pvt LtdGurugram

Role : Site Reliability Engineer Experience Range : 7 - 12 Years Location : Pune & Chennai, Bangalore , Gurgaon Mode of Work : Hyb...Show moreLast updated: 30+ days ago

Promoted
New!

Site Reliability Engineer

BayOne Solutionsnoida, delhi, in

Role : Site Reliability Engineer.The CXE Site Reliability Engineering (SRE) team manages the CI / CD pipelines and cloud infrastructure, ensuring seamless deployment, monitoring, and maintenance.Howev...Show moreLast updated: 21 hours ago

Promoted
New!

Azure Data Engineers - Site Reliability Engineering

GSPANNgurugram, India

Description GSPANN is hiring Azure Data Engineers with expertise in Site Reliability Engineering (SRE) to optimize and automate large-scale data applications. The role involves ensuring system relia...Show moreLast updated: less than 1 hour ago

Promoted

RELX - Site Reliability Engineer - IAC Terraform

REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Gurugram

Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 19 days ago

Promoted

Site Reliability Engineer - Incident Management

FxConsultingGurugram

Job Title : Site Reliability Engineer Location : Gurgaon, India Experience : 6 to 9 years Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer - AWS / Azure Cloud Services

SkyFlowDelhi, IN

Skyflow is a data privacy vault company built to radically simplify how companies isolate, protect, and govern their customers most sensitive data. With its global network of data privacy vaults, Sk...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer - CI / CD

hirezy.aiDelhi, IN

Remote

Technical Skills : - Programming : Proficiency in languages like Python, Bash, or Java is essential.Operating Systems : Deep understanding of Linux / Windows operating ...Show moreLast updated: 30+ days ago

Promoted

Project Manager - Site Reliability

Hudson RPODelhi, IN

Role : SRE Project Manager Location : Gurugram The SRE Project Manager will be responsible for the planning, implementation, and tracking of SRE projects f...Show moreLast updated: 15 days ago

Promoted

Site Reliability Engineer

ExasoftGhaziabad, IN

Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 1 day ago

Promoted

Site Reliability Engineer - Chaos Management

Xebiagurgaon, haryana, in

Promoted

Gemini Solutions - Site Reliability Engineer - Cloud Solutions

Gemini Solutions Private LimitedGurugram

Position Summary : In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliab...Show moreLast updated: 22 days ago

Promoted

Staff Engineer - Site Reliability

DashhireDelhi, IN

Remote

Responsibilities : - The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services.Th...Show moreLast updated: 30+ days ago