Talent.com
Senior Site Reliability Engineer (SRE) – Datadog Observability

Senior Site Reliability Engineer (SRE) – Datadog Observability

Jade GlobalJamnagar, Gujarat, India
2 days ago
Job description

Job Description

Job Description

Job Title :

Senior Site Reliability Engineer (SRE) – Datadog Observability

Experience Required :

8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog

Location :

Hyderabad preferable but open for Pune and remote

Job Summary :

We are seeking an experienced

Site Reliability Engineer (SRE)

to lead end-to-end SRE implementation initiatives with a strong focus on

Datadog Observability . The ideal candidate will bring deep technical expertise in building reliable, scalable, and observable systems, with hands-on experience in integrating enterprise applications and middleware

Key Responsibilities :

Drive

end-to-end SRE implementation , ensuring system reliability, scalability, and performance.

Design, configure, and manage Datadog dashboards , monitors, alerts, and APM for proactive issue detection and resolution.

Utilize the

Datadog Roles API

to create and manage user roles, global permissions, and access controls for various teams.

Collaborate with product managers, engineering teams, and business stakeholders to identify observability gaps and design solutions using Datadog.

Implement automation for

alerting, incident response, and ticket creation

to improve operational efficiency.

Work closely with business and IT teams to support critical

Financial Month-End, Quarter-End, and Year-End closures .

Leverage

Datadog AI

Provide technical leadership in observability, reliability, and performance engineering practices

Required Skills and Experience :

8+ years

of experience in Site Reliability Engineering, Observability

Minimum 3+ years

of hands-on experience with

Datadog

(dashboards, APM, alerting, log management, Roles API, and monitoring setup).

Proven experience implementing

SRE best practices —incident management, postmortems, automation, and reliability metrics

Excellent stakeholder management and communication skills ; experience collaborating with

business and IT teams .

Strong problem-solving mindset and ability to work in high-pressure production support environments.

Preferred Qualifications :

Certification in Datadog or related observability platforms.

Knowledge of CI / CD tools and automation frameworks.

Experience in cloud platforms (AWS, Azure, or OCI).

Exposure to ITIL-based production support processes.

Create a job alert for this search

Senior Site Reliability Engineer • Jamnagar, Gujarat, India