Talent.com
SRE Observability Engineer
SRE Observability EngineerTerraGiG • hyderabad, telangana, in
SRE Observability Engineer

SRE Observability Engineer

TerraGiG • hyderabad, telangana, in
5 days ago
Job description

We are looking for SRE Observability Engineer

About the Role : Duration : Permanent

Location : Hyderabad

Timings : Full Time (As per company timings)

Notice Period : (Immediate Joiner - Only)

Experience : 6-10 Years

JD :

Position : SRE Observability Engineer

Exp : 5+ to 10 Years

Location : Hyderabad

Mandatory Skills : Observability, Grafana and Writing queries using Prometheus and Loki.

Job Description :

We are seeking a highly experienced and driven Senior Observability Engineer to lead the design, development, and maintenance of observability solutions across our infrastructure, applications, and services. As a Senior Observability Engineer, you will be at the forefront of implementing cutting-edge monitoring, logging, and tracing solutions that ensure the reliability, performance, and availability of our complex, distributed systems. You will be collaborating with cross-functional teams, including Development, Infrastructure Engineers, DevOps, and SREs, to optimize system observability, and improve our incident response capabilities.

Key Responsibilities :

  • Lead the Design & Implementation of observability solutions, including monitoring, logging, and tracing for both cloud and on-premises environments.
  • Drive the Development and maintenance of advanced monitoring tools such as Prometheus, Grafana, Datadog, New Relic, and AppDynamics.
  • Implement Distributed Tracing frameworks like OpenTelemetry, Jaeger, or Zipkin, and enhance application performance diagnostics and troubleshooting.
  • Optimize Log Management and analysis strategies using tools like Elasticsearch, Splunk, Loki, and Fluentd, ensuring efficient log processing and insights.
  • Develop Advanced Alerting and anomaly detection strategies to proactively identify system issues, minimizing downtime and improving Mean Time to Recovery (MTTR).
  • Collaborate with Development & SRE Teams to enhance observability in CI / CD pipelines, microservices architectures, and across various platform environments.
  • Automate Observability Tasks by leveraging scripting languages such as Python, Bash, or Golang to increase efficiency and scale observability operations.
  • Ensure Scalability & Efficiency of monitoring solutions to manage large-scale distributed systems and handle evolving business requirements.
  • Lead Incident Response by providing actionable insights through observability data for effective troubleshooting and root cause analysis.
  • Stay Abreast of Industry Trends in observability, Site Reliability Engineering (SRE), and monitoring practices, continuously improving processes.

Required Qualifications :

  • 5+ years of hands-on experience in observability, SRE, DevOps, or a related field, with a proven track record of successfully managing complex, large-scale distributed systems.
  • Expert-level proficiency in observability tools such as Prometheus, Grafana, Datadog, New Relic, AppDynamics, with the ability to lead the design and implementation of these solutions at scale.
  • Advanced experience with log management platforms like Elasticsearch, Splunk, Loki, and Fluentd, and the ability to optimize log aggregation and analysis for better performance insights.
  • Deep expertise in distributed tracing tools such as OpenTelemetry, Jaeger, or Zipkin, with a focus on performance optimization and root cause analysis.
  • Extensive experience with cloud environments (preferably Azure, AWS, GCP) and Kubernetes for deploying and managing observability solutions across modern, cloud-native infrastructures.
  • Advanced proficiency in scripting languages such as Python, Bash, or Golang, and strong experience with Infrastructure as Code (IaC) tools like Terraform and Ansible.
  • Strong understanding of system architecture, performance tuning, and troubleshooting complex production environments, with an emphasis on scalability and high availability.
  • Proven experience in leading and mentoring teams, providing technical direction, and driving the adoption of best practices for observability and monitoring.
  • Exceptional problem-solving skills, with a focus on providing actionable insights and data-driven decision-making.
  • Ability to lead high-impact projects, effectively communicate with stakeholders, and influence cross-functional teams.
  • Strong communication and collaboration skills; demonstrated ability to work closely with engineering teams, leadership, and external partners to meet observability and system reliability goals.
  • Preferred Qualifications :

  • Experience with AI-driven observability tools and anomaly detection techniques.
  • Familiarity with microservices, serverless architectures, and event-driven systems.
  • Proven track record of handling on-call rotations and incident management workflows in high-availability environments.
  • Relevant certifications in observability tools, cloud platforms, or SRE best practices are a plus.
  • Interested candidates please share your resume to balkis.begam@terragig.in

    Create a job alert for this search

    Observability Engineer • hyderabad, telangana, in

    Related jobs
    UAV Systems Architect

    UAV Systems Architect

    PhoQtek labs • Hyderabad, Republic Of India, IN
    Phoqtek Labs is seeking an exceptional.The candidate will be responsible for the.Visual Navigation Systems (VNS).NVIDIA Jetson Orin Nano / Xavier. Design, assemble, and optimize.ESC configuration, pro...Show more
    Last updated: 30+ days ago • Promoted
    Lead Propulsion Systems Engineer

    Lead Propulsion Systems Engineer

    Adani Defence and Aerospace • Hyderabad, Republic Of India, IN
    Senior Engineer – Engine Operations.UAV propulsion system integration, calibration, and diagnostics.This role oversees bench tests, root cause analysis, scheduled maintenance, and post-flight propu...Show more
    Last updated: 30+ days ago • Promoted
    SRE Observability Engineer

    SRE Observability Engineer

    TerraGiG • Hyderabad, Telangana, India
    Timings : Full Time (As per company timings).Notice Period : (Immediate Joiner - Only).Position : SRE Observability Engineer. Mandatory Skills : Observability, Grafana and Writing queries using Promethe...Show more
    Last updated: 4 days ago • Promoted
    Lead Site Reliability Engineer (SRE)

    Lead Site Reliability Engineer (SRE)

    Datum Technologies Group • Hyderabad, IN
    Job Title : Lead Site Reliability Engineer (SRE).Duration : Contract to Hire (On the Payroll of Datum Technology Group).Location : Chennai || Mumbai || Gurugram. Interview Process : Virtual (2 Rounds) +...Show more
    Last updated: 6 hours ago • Promoted • New!
    Sr. Emulation Engineer

    Sr. Emulation Engineer

    ACL Digital • Hyderabad, Telangana, India
    Cadence / Synopsys tool flows (Palladium / Protium / HAPS / Zebu).Working knowledge of System Verilog & Verilog language semantics and compilation flows. Solid understanding on SOC architecture and AXI prot...Show more
    Last updated: 30+ days ago • Promoted
    Sre Observability Engineer

    Sre Observability Engineer

    TerraGiG • Hyderabad, Republic Of India, IN
    Timings : Full Time (As per company timings).Notice Period : (Immediate Joiner - Only).Position : SRE Observability Engineer. Mandatory Skills : Observability, Grafana and Writing queries using Promethe...Show more
    Last updated: 4 days ago • Promoted
    SRE (Site Reliability Engineer)

    SRE (Site Reliability Engineer)

    Tata Consultancy Services • Hyderabad, Republic Of India, IN
    Kubernetes (Any cloud) + PostgresSQL, SQL(Must).Linux (Optional), Java (Optional), Kubernetes (CLI), Prior Production support experience, Release Management, Prior Deployment experience,.Show more
    Last updated: 13 days ago • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaService • Hyderabad, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 22 days ago • Promoted
    Site Reliability Engineer (SRE) - AWS

    Site Reliability Engineer (SRE) - AWS

    Tata Consultancy Services • Hyderabad, Republic Of India, IN
    Greetings from TATA Consultancy Services!!.Thank you for expressing your interest in exploring a career possibility with the TCS Family. Our company is moving fast from traditional IT world to a Dig...Show more
    Last updated: 19 days ago • Promoted
    Observability Solutions Engineer

    Observability Solutions Engineer

    TerraGiG • Hyderabad, Republic Of India, IN
    Timings : Full Time (As per company timings).Notice Period : (Immediate Joiner - Only).Position : SRE Observability Engineer. Mandatory Skills : Observability, Grafana and Writing queries using Promethe...Show more
    Last updated: 5 days ago • Promoted
    Senior Observability Specialist

    Senior Observability Specialist

    TerraGiG • Hyderabad, Republic Of India, IN
    Timings : Full Time (As per company timings).Notice Period : (Immediate Joiner - Only).Position : SRE Observability Engineer. Mandatory Skills : Observability, Grafana and Writing queries using Promethe...Show more
    Last updated: 5 days ago • Promoted
    Sr. Site Reliability Engineer (SRE)

    Sr. Site Reliability Engineer (SRE)

    Datum Technologies Group • Hyderabad, IN
    Site Reliability Engineer (SRE).Duration : Contract to Hire (On the Payroll of Datum Technology Group).Location : Chennai || Mumbai || Gurugram. Interview Process : Virtual (2 Rounds) + 1 Technical scr...Show more
    Last updated: 6 hours ago • Promoted • New!
    Cloud Infrastructure & SRE Engineer – AWS + Automation

    Cloud Infrastructure & SRE Engineer – AWS + Automation

    Creyente Infotech • Hyderabad, Telangana, India
    We’re hiring a Cloud Infrastructure & SRE Engineer who doesn’t just manage systems — but engineers reliability into everything. At Creyente Infotech, we’re building real-time platforms that power in...Show more
    Last updated: 1 day ago • Promoted
    Sr Engineer, SDET [T500-20287]

    Sr Engineer, SDET [T500-20287]

    TMUS Global Solutions • Hyderabad, Telangana, India
    NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flagship brands, T-Mobile and Metro by T-Mo...Show more
    Last updated: 30+ days ago • Promoted
    Avionics Integration Lead

    Avionics Integration Lead

    Dhruva Space • Hyderabad, Republic Of India, IN
    Role Overview and Responsibilities : .Dhruva Space is seeking an experienced and highly skilled Senior Electronics Engineer to lead the Assembly, Integration, and Testing of spacecraft electronics sy...Show more
    Last updated: 30+ days ago • Promoted
    Observability Engineer

    Observability Engineer

    VXI Global Solutions • Hyderabad, Republic Of India, IN
    We are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications.The id...Show more
    Last updated: 9 days ago • Promoted
    UAV Propulsion Systems Technical Lead

    UAV Propulsion Systems Technical Lead

    Adani Defence and Aerospace • Hyderabad, Republic Of India, IN
    Senior Engineer – Engine Operations.UAV propulsion system integration, calibration, and diagnostics.This role oversees bench tests, root cause analysis, scheduled maintenance, and post-flight propu...Show more
    Last updated: 24 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    VXI Global Solutions • Hyderabad, Telangana, India
    We are looking for a Site Reliability Engineer with 3+ years for Experience into design, implement, and manage robust observability solutions across our cloud infrastructure and applications.The id...Show more
    Last updated: 30+ days ago • Promoted