Talent.com
Site Reliability Engineer (SRE) - Observability & Azure Infrastructure

Site Reliability Engineer (SRE) - Observability & Azure Infrastructure

ConfidentialHyderabad / Secunderabad, Telangana
30+ days ago
Job description

Key Responsibilities

  • Observability Platform Implementation :
  • Design and maintain distributed tracing, metrics, and logging using OpenTelemetry, Prometheus, Loki, and Tempo.
  • Ensure complete instrumentation of .NET Core applications for end-to-end visibility. o Implement telemetry pipelines for application logs, performance metrics, and traces.
  • Monitoring & Alerting :
  • Develop and manage SLIs, SLOs, and error budgets.
  • Create actionable, noise-free alerts using Prometheus Alertmanager and Azure Monitor. o Monitor key infrastructure components, applications, and databases with a focus on reliability and performance.
  • Azure & Infrastructure Integration :
  • Integrate Azure services (App Services, VMs, Storage, etc.) with the observability stack. o Configure monitoring for MSSQL databases, including performance tuning metrics and health indicators. o Use Azure Monitor, Log Analytics, and custom exporters where necessary.
  • Automation & DevOps :
  • Automate observability configurations using Terraform, PowerShell, or other IaC tools.
  • Integrate telemetry validation and health checks into CI / CD pipelines.
  • Maintain observability as code for repeatable deployments and easy scaling.
  • Resilience & Reliability Engineering :
  • Conduct capacity planning to anticipate scaling needs based on usage patterns and growth.
  • Define and implement disaster recovery strategies for critical Azure-hosted services and databases.
  • Perform load and stress testing to identify performance bottlenecks and validate infrastructure limits.
  • Support release engineering by integrating observability checks and rollback strategies in CI / CD pipelines.
  • Apply chaos engineering practices in lower environments to uncover potential reliability risks proactively.
  • Collaboration & Documentation :
  • Partner with engineering teams to promote observability best practices in .NET Core development. o Create dashboards (Grafana preferred) and runbooks for system insights and incident response. o Document monitoring standards, troubleshooting guides, and onboarding materials.

Required Skills and Experience

  • 4+ years of experience in SRE, DevOps, or infrastructure-focused roles.
  • Deep experience with .NET Core application observability using OpenTelemetry.
  • Proficiency with Prometheus, Loki, Tempo, and related observability tools.
  • Strong background in Azure infrastructure monitoring, including App Services and VMs.
  • Hands-on experience monitoring MSSQL databases (deadlocks, query performance, etc.).
  • Familiarity with Infrastructure as Code (Terraform, Bicep) and scripting (PowerShell, Bash).
  • Experience building and tuning alerts, dashboards, and metrics for production systems.
  • Preferred Qualifications

  • Azure certifications (e.g., AZ-104, AZ-400).
  • Experience with Grafana, Azure Monitor, and Log Analytics integration.
  • Familiarity with distributed systems and microservice architectures.
  • Prior experience in high-availability, regulated, or customer-facing environments.
  • Skills Required

    Grafana, Powershell, Terraform, .Net Core

    Create a job alert for this search

    Site Reliability Engineer • Hyderabad / Secunderabad, Telangana

    Related jobs
    • Promoted
    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Jade Globalhyderabad, telangana, in
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    AutoRABIThyderabad, telangana, in
    AutoRABIT is the leader in DevSecOps for SaaS platforms such as Salesforce.Its unique metadata-aware capability makes Release Management, Version Control, and Backup & Recovery complete, reliable, ...Show moreLast updated: 30+ days ago
    • Promoted
    SRE (Site Reliability Engineer)

    SRE (Site Reliability Engineer)

    Sonata SoftwareHyderabad, Republic Of India, IN
    We have immediate openings for SRE.Role - Site Reliability Engineer.Interested candidates can share your CVs to - sravani.Show moreLast updated: 22 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Tata Consultancy ServicesHyderabad, Telangana, India
    We are currently seeking a for a position SRE Engineer in Hyderabad.Job ID : 375656 • • • •Apply Here : • • (TCS iBegin) • •Job Description : • • - Proven experience as a DevOps / SRE Engineer - Expertise in...Show moreLast updated: 22 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Prometheus consultingHyderabad
    WHAT YOU'LL DO : - Support, maintain, and enhance the reliability, scalability, and performance of our Azure-based Data Analytics Platform. Collaborate closely with Data En...Show moreLast updated: 10 days ago
    • Promoted
    Senior Site Reliability Engineer (Sre) – Datadog Observability

    Senior Site Reliability Engineer (Sre) – Datadog Observability

    Jade GlobalHyderabad, Republic Of India, IN
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 1 day ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    IntraEdgeHyderabad, IN
    Strong leadership and people management skills.Exceptional technical proficiency in Pearson's technology stack.Strategic thinking with a focus on long-term operational excellence.Champion operation...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Talent Sutrahyderabad, telangana, in
    The position exists to deploy the products and their updates ensuring smooth infrastructure and configuration management for robust project delivery. Operating System (Linux & Windows), Ansible, Doc...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CapgeminiHyderabad, IN
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 11 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Nebula Tech SolutionsHyderabad, Telangana, India
    At Nebula Tech Solutions , we’re building a high-performing SRE team supporting mission-critical applications for our US-based enterprise clients. We’re now looking for engineers who can go beyond...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    o9 Solutions, Inc.Hyderabad, Telangana, India
    Be part of something revolutionary At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 22 days ago
    • Promoted
    Site Reliability Engineer I

    Site Reliability Engineer I

    ConfidentialHyderabad / Secunderabad, Telangana
    We're looking for a Senior Site Reliability Engineer to join our team of Phenom.In this position, you'll work on our core product environment upgradations, production issues fixing and incident res...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    ConfidentialHyderabad / Secunderabad, Telangana
    Design, build, and maintain scalable, highly available, and resilient infrastructure.Develop automation tools and scripts to improve operational efficiency and reduce manual intervention.Monitor sy...Show moreLast updated: 30+ days ago
    • Promoted
    Infrastructure Automation Site Reliability Engineer (SRE)

    Infrastructure Automation Site Reliability Engineer (SRE)

    ConfidentialHyderabad / Secunderabad, Telangana, India
    The Infrastructure Automation Site Reliability Engineer (SRE) bridges the gap between development and operations by applying software engineering principles to infrastructure and operational challe...Show moreLast updated: 4 days ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    ConfidentialHyderabad / Secunderabad, Telangana
    Contribute in the of adoption of DevOps as we'll as DevOps architecture and design for various services in the organization. Self-starter with zeal to own things from Start to End with little oversi...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer (SRE)

    Senior Site Reliability Engineer (SRE)

    Tata Consultancy Servicessecunderabad, India
    Senior Site Reliability Engineer (SRE).Senior Site Reliability Engineer (SRE).Desired Experience Range : 7 - 10 yrs.Notice Period : Immediate to 90Days only. We are currently planning to do a Virtual....Show moreLast updated: 11 days ago
    • Promoted
    • New!
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Futurism Technologies, INC.hyderabad, India
    Site Reliability Engineering (SRE) Lead.We are seeking a highly skilled and experienced.You will lead a team responsible for building and maintaining automated deployment pipelines, infrastructure ...Show moreLast updated: 17 hours ago
    • Promoted
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    S&P GlobalHyderabad, Telangana, India
    This job is with S&P Global, an inclusive employer and a member of myGwork – the largest global platform for the LGBTQ+ business community. Please do not contact the recruiter directly.About the Rol...Show moreLast updated: 7 days ago