Talent.com
Senior Site Reliability Engineer (SRE) – Datadog Observability

Senior Site Reliability Engineer (SRE) – Datadog Observability

Jade Globalmumbai, maharashtra, in
18 hours ago
Job description

Job Description

Job Description

Job Title : Senior Site Reliability Engineer (SRE) – Datadog Observability

Experience Required : 8+ years overall in SRE and Infrastructure Operations with minimum 3 + years hands-on experience in Datadog

Location : Hyderabad preferable but open for Pune and remote

Job Summary :

We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE implementation initiatives with a strong focus on Datadog Observability . The ideal candidate will bring deep technical expertise in building reliable, scalable, and observable systems, with hands-on experience in integrating enterprise applications and middleware

Key Responsibilities :

  • Drive end-to-end SRE implementation , ensuring system reliability, scalability, and performance.
  • Design, configure, and manage Datadog dashboards , monitors, alerts, and APM for proactive issue detection and resolution.
  • Utilize the Datadog Roles API to create and manage user roles, global permissions, and access controls for various teams.
  • Collaborate with product managers, engineering teams, and business stakeholders to identify observability gaps and design solutions using Datadog.
  • Implement automation for alerting, incident response, and ticket creation to improve operational efficiency.
  • Work closely with business and IT teams to support critical Financial Month-End, Quarter-End, and Year-End closures .
  • Leverage Datadog AI
  • Provide technical leadership in observability, reliability, and performance engineering practices

Required Skills and Experience :

  • 8+ years of experience in Site Reliability Engineering, Observability
  • Minimum 3+ years of hands-on experience with Datadog (dashboards, APM, alerting, log management, Roles API, and monitoring setup).
  • Proven experience implementing SRE best practices —incident management, postmortems, automation, and reliability metrics
  • Excellent stakeholder management and communication skills ; experience collaborating with business and IT teams .
  • Strong problem-solving mindset and ability to work in high-pressure production support environments.
  • Preferred Qualifications :

  • Certification in Datadog or related observability platforms.
  • Knowledge of CI / CD tools and automation frameworks.
  • Experience in cloud platforms (AWS, Azure, or OCI).
  • Exposure to ITIL-based production support processes.
  • Create a job alert for this search

    Senior Site Reliability Engineer • mumbai, maharashtra, in