Site Reliability Engineer

Xomiro TechnologiesDelhi, IN

15 days ago

Job type

Remote

Job description

Description :

Role : Site Reliability Engineer (SRE)

Location : Remote-First - (Bangalore)(Hybrid : Rare Office Attendance Only)

Work Mode : Permanent Night Shift

Experience Required : 6+ Years

Position Type : Individual Contributor

Role Summary :

We are seeking a Site Reliability Engineer (SRE) with 6+ years of experience to support the mission-critical operations of a large US banking client. This is a night shift role designed for proactive engineers who can combine production support with hands-on engineering and automation. You will monitor systems using observability tools, handle incidents, and collaborate closely with developers to ensure SLAs / SLOs are met.

You will not just respond to alerts, but actively improve reliability, automate repetitive tasks, and stabilize services. This is a remote-first role with rare office attendance (e.g., BCP drills or infra escalations). Candidates must be self-driven, capable of deep technical diagnosis, and comfortable with distributed collaboration. Note : This is a permanent night shift role. Candidates should confirm night shift readiness.

Must-Have Skills & Depth :

Monitoring & Observability - Must have configured or extensively used dashboards in toos like Grafana, Prometheus, ELK, or GCP Stackdriver to monitor availability, latency, and errors. Must be able to define alert thresholds, interpret log patterns, and correlate multi-source metrics. Knowing the abovementioned tools is not necessary, and knowledge
Incident Management - Must have handled P1 / P2 incidents end-to-end : alert triage, participation in bridge calls, creating / updating Jira / SNOW tickets, stakeholder communication, and drafting RCA / post-mortem reports.
Automation & Scripting - Must have created or enhanced scripts in Bash or Python to automate health checks, alert suppression logic, log ingestion, or recovery steps. Should be able to write modular, reusable scripts.
Cloud Platform Experience (Preferred GCP) - Must have worked on production-grade services on at least one cloud platform (GCP preferred). Experience using cloud-native monitoring, logs (e.g., Stackdriver, CloudWatch), and basic resource inspection (VMs, storage, network health).
SLI / SLO Awareness - Should have tracked SLO breaches using observability platforms. Exposure to error budgets, latency SLIs, and availability metrics. No need to define them independently, but must interpret breaches.
Java Runtime Awareness - Must be able to analyze JVM logs (GC pauses, memory issues, thread deadlocks). Not required to perform tuning, but must detect symptoms and raise root cause hypotheses.
DB Performance Triage (Oracle / Postgres) - Must be able to spot DB-related errors or latency issues from logs / alerts. Not expected to tune queries, but must collaborate with DBAs / devs using evidence from logs or APM.
Dev Collaboration - Should have participated in daily ops / dev stand-ups, escalations, or RCA calls, contributing production context to code fixes or config changes. Strong communication expected.
Night Shift Readiness - Full alignment to US working hours is mandatory. Shift is fixed and non-rotational. Candidate must have experience working night shifts or must explicitly confirm readiness.

Nice-to-Have Skills :

Change Management - Exposure to ServiceNow, CAB processes, or deployment planning. Familiarity with structured release windows and rollback protocols.

Capacity Planning - Assisted in planning infra scale-up / down based on usage trends, using monitoring tools or dashboards (CPU, memory, traffic alerts).

CI / CD Integration - Familiar with embedding health checks, smoke tests, or SRE gates in Jenkins, GitHub Actions, or other pipelines.

Error Budget Automation - Exposure to setting or consuming automated alerts when services cross SLO budgets using tools like SLO Generator, Datadog, or custom scripts.

Terraform / IaC (Optional) - Able to read and interpret Terraform scripts, especially for monitoring agent deployment or alert rule provisioning. Not required to write from scratch.

GCP Certification - GCP Associate Cloud Engineer or similar cert is a plus, not mandatory

(ref : hirist.tech)

Create a job alert for this search

Site Reliability Engineer • Delhi, IN

Related jobs

Promoted

Site Reliability Engineer

WhiteLotus Talent PartnersDelhi, India

L0 and L1 Site Reliability Engineer (SRE) Support.Krutrim Cloud Site Reliability operations team and ensure the smooth functioning of our cloud infrastructure powered by. In this role, you will focu...Show moreLast updated: 22 days ago

Promoted

Senior Site Reliability Engineer

IntraEdgeGhaziabad, IN

Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 13 days ago

Promoted

Manager, Site Reliability Engineering

Cventgurugram, uttar pradesh, in

Cvent is looking for a Manager, Site Reliability Engineering to help us scale our systems and ensure stability, reliability and performance and rapid deployments of our platform.We build teams that...Show moreLast updated: 7 days ago

Promoted

Site Reliability Engineer II

RecRootsDelhi, India

Key Job Responsibilities and Duties : .The core premise for the SRE lies in treating operational issues as a software problem. We code our way out of problems where operations are concerned addressing...Show moreLast updated: 28 days ago

Promoted

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.Delhi, IN

Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer (SRE) – Datadog Observability

Jade Globalfaridabad, haryana, in

Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 23 hours ago

Promoted

SITA - Senior / Lead Site Reliability Engineer

SITA INFORMATION NETWORKING COMPUTING INDIADelhi

About the job : WELCOME TO SITA : We're the team that keeps airports moving, airlines flying smoothly, and borders open.Our tech and communi...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

CapgeminiDelhi, IN

Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 10 days ago

Promoted

Site Reliability Engineer

SFS Group India Pvt. Ltd.Delhi, India

Objectives Act as the Site Reliability Engineer for global operations, ensuring system stability, scalability, and efficiency through advanced automation, observability, and proactive infrastructur...Show moreLast updated: 17 days ago

Promoted

Senior Site Reliability Engineer

ConfidentialGurgaon / Gurugram

As a Site Reliability Engineer, you'll use your advanced development and operations knowledge to identify and prioritize issues. Find universal solutions to common problems and mentor and support ju...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

SynechronDelhi, India

Good-day, We have immediate opportunity for Senior Site Reliability Engineer.Senior Site Reliability Engineer Job Location : Synechron. Notice : Immediate Joiner About Company : At Synechron, we belie...Show moreLast updated: 28 days ago

Promoted

Site Reliability Engineer III

ConfidentialGurugram, Gurgaon / Gurugram, India

Zinnia is the leading technology platform for accelerating life and annuities growth.With innovative enterprise solutions and data insights, Zinnia simplifies the experience of buying, selling, and...Show moreLast updated: 4 days ago

Promoted

Senior Site Reliability Engineer

Nebula Tech Solutionsgurugram, uttar pradesh, in

SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 23 hours ago

Promoted

Site Reliability Engineer

CodeKarmagurugram, uttar pradesh, in

Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 21 days ago

Promoted

Site Reliability Engineer

MorphleDelhi, India

We are looking for a Software Support Engineer who is passionate about technology, curious about how things work, and thrives in a fast-paced, high-impact environment. This role is critical in ensur...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

ConfidentialGurugram, Gurgaon / Gurugram, India

Grade Level (for internal use) : .S&P Global provides innovative products and services that enhance transparency, reduce risk, and improve operational efficiency. Our customers include banks, hedge fu...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

o9 Solutions, Inc.ghaziabad, uttar pradesh, in

Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 22 days ago

Promoted

Site Reliability Engineer / Lead Site Reliability Engineer

ConfidentialNoida, India

BOLD is seeking professionals who will be responsible for performing the build and release activities with Microsoft Technology stack. This person will also manage CI / CD pipelines and automate the b...Show moreLast updated: 30+ days ago