This job offer is not available in your country.

SRE Head

SID Global SolutionsIndia

4 days ago

Job description

Job Title : SRE Head

Experience Level : ~10 years

Role Type :

Engineering / Reliability

Role Overview :

The SRE Head is responsible for leading and scaling the Site Reliability Engineering (SRE) function across the organization. This role defines the reliability strategy, standards, and practices to ensure high availability, performance, and resilience of critical systems. The SRE Head partners with engineering, infrastructure, and operations teams to embed reliability, observability, and continuous improvement across all services.

Key Responsibilities :

Lead and define the

SRE strategy , operating model, and best practices across the organization.

Establish and maintain

SLIs, SLOs, and SLAs

to measure and ensure service reliability and performance.

Oversee

incident management ,

post-incident reviews , and

root cause analysis

for major outages.

Drive

resilience engineering ,

disaster recovery , and

chaos engineering

initiatives.

Collaborate with

development, infrastructure, and operations teams

to improve reliability and automation.

Lead efforts to improve

observability , including metrics, logging, and tracing frameworks.

Foster a culture of

proactive reliability ,

continuous learning , and

blameless postmortems .

Mentor and guide

SRE leads and engineers , building high-performing reliability teams.

Track and communicate

reliability trends , key metrics, and risk areas to leadership.

Evaluate and adopt emerging tools and practices to enhance platform reliability and scalability.

Required Qualifications & Experience :

10+ years

of experience in

SRE, reliability engineering, or production operations

in large-scale environments.

Proven expertise in

availability management ,

incident response , and

service continuity .

Strong technical understanding of

cloud platforms (GCP / AWS / Azure) ,

Kubernetes ,

CI / CD , and

automation .

Proficiency in

observability tools

(e.g., Prometheus, Grafana, Dynatrace, Datadog, ELK, OpenTelemetry).

Experience implementing

SLIs / SLOs ,

error budgets , and

capacity planning frameworks .

Strong

leadership ,

strategic thinking , and

cross-functional collaboration

skills.

Excellent

communication ,

mentoring , and

culture-building

abilities.

Desirable Skills : Experience in

building and scaling SRE organizations

or CoEs.

Exposure to

performance engineering ,

cost optimization , and

AIOps practices .

Deep understanding of

network reliability ,

security resiliency , and

compliance-driven uptime goals .

Certification in

reliability or cloud architecture

(e.g., Google SRE, GCP Professional Architect).

Create a job alert for this search

Sre • India