Talent.com
This job offer is not available in your country.
Senior Site Reliability Engineer

Senior Site Reliability Engineer

GSPANNHyderabad, Telangana, India
15 hours ago
Job description

Description GSPANN is hiring a Senior Site Reliability Engineer (SRE) to join our team in Pune or Hyderabad. This full-time role focuses on enhancing the reliability, scalability, and observability of global cloud-based systems through automation, performance tuning, and modern DevOps practices.

Role and Responsibilities

  • Manage and support production environments on cloud platforms, with a strong preference for Microsoft Azure.
  • Apply expertise in observability tools such as Dynatrace, Splunk, Datadog, Grafana, and New Relic to monitor system health.
  • Implement modern observability practices including end-to-end (E2E) instrumentation, telemetry, and unified dashboard creation.
  • Drive organizational change by influencing senior leadership and improving SRE practices company-wide.
  • Write automation scripts using Python (strongly preferred) to streamline operations and eliminate manual effort.
  • Deploy cloud infrastructure using tools like Ansible, Terraform, and Azure DevOps.
  • Work confidently with Continuous Integration / Continuous Deployment (CI / CD) tools such as GitLab, Jenkins, Bamboo, Travis CI, and CircleCI.
  • Operate and orchestrate containerized environments using Kubernetes and Docker.
  • Troubleshoot complex issues and provide reliable, scalable solutions.
  • Embrace continuous learning and demonstrate a strong passion for automation and process improvement.
  • Use logging stacks like ELK (Elasticsearch, Logstash, and Kibana), Loki, and Splunk to maintain visibility and traceability.
  • Influence organizational adoption of Infrastructure as Code (IaC) and CI / CD methodologies.
  • Define and monitor Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
  • Lead incident response efforts and perform Root Cause Analysis (RCA) to minimize recurrence.

Skills and Experience

  • Bachelor’s degree in Computer Science, Information Science, Engineering, or a related discipline.
  • 6+ years of experience in Site Reliability Engineering (SRE) or DevOps roles, with a focus on cloud-based production systems.
  • Ensure the availability, low latency, performance, and cost efficiency of global e-commerce platforms.
  • Design and maintain full-stack observability solutions, including dashboards and standardized instrumentation.
  • Implement advanced monitoring and alerting systems tailored for both internal engineering teams and external stakeholders.
  • Advocate for SRE best practices and promote operational excellence across teams and departments.
  • Collaborate with engineering, product, and operations teams to increase reliability and accelerate delivery timelines.
  • Build automation tools that support incident response, system recovery, and software delivery pipelines.
  • Track and maintain error budgets, achieve defined SLOs, and guarantee high uptime for mission-critical services.
  • Identify system bottlenecks and anomalies proactively, ensuring optimal performance under peak loads.
  • Automate infrastructure management to reduce costs and scale efficiently during traffic surges.
  • Lead strategic, cross-functional initiatives that enhance overall system architecture and reliability.
  • Create a job alert for this search

    Senior Site Reliability Engineer • Hyderabad, Telangana, India