Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

GSPANNPune, Maharashtra, India
5 hours ago
Job description

Description GSPANN is hiring a Site Reliability Engineer (SRE) for its Pune or Hyderabad location. This full-time role focuses on enhancing the reliability of global eCommerce platforms through automation, observability, and cloud-native tools like Azure, Kubernetes, and Terraform.

Role and Responsibilities

  • Use monitoring tools such as Dynatrace, Splunk, Datadog, Grafana, or New Relic in hands-on scenarios.
  • Demonstrate strong knowledge of observability tools, trends, and technologies.
  • Identify gaps in SRE practices and implement scalable, effective solutions.
  • Support cloud-based production environments, with a preference for Microsoft Azure.
  • Write automation scripts proficiently, ideally using Python.
  • Utilize cloud deployment tools like Ansible, Terraform, and Azure DevOps effectively.
  • Work comfortably in containerized environments using Kubernetes and Docker.
  • Apply configuration management tools such as Chef, Ansible, or AWS CodeDeploy.
  • Troubleshoot complex issues independently and provide quick resolutions.
  • Use and configure observability dashboards and manage end-to-end (E2E) monitoring requirements.
  • Maintain expertise in cloud and automation tools (e.g., Azure, Python).
  • Leverage Continuous Integration / Continuous Deployment (CI / CD) and Infrastructure as Code (IaC) tools like GitLab, Jenkins, Ansible, Terraform, and Azure DevOps.
  • Exhibit soft skills including ownership, effective troubleshooting, and strong collaboration.
  • Define and monitor Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
  • Participate in incident response efforts and conduct Root Cause Analysis (RCA) post-outages.

Skills and Experience

  • Bachelor's degree in Computer Science, Information Science, Engineering, or a related field.
  • 3–8 years of experience in a Site Reliability Engineering (SRE) or DevOps role.
  • Monitor global e-commerce platforms to ensure optimal availability, performance, and efficiency while managing emergency responses.
  • Promote observability best practices and drive operational excellence across systems.
  • Build and maintain comprehensive observability dashboards with end-to-end monitoring.
  • Design solutions and tools that enhance visibility for both internal teams and external stakeholders.
  • Establish instrumentation standards and develop repeatable implementation patterns for engineering teams.
  • Work closely with cross-functional teams to embed high-reliability practices into system design and operations.
  • Apply SRE principles to improve overall system performance and reduce incidents.
  • Automate incident response processes and coordinate outage preparedness across teams.
  • Maintain error budgets, meet SLOs, and ensure consistent uptime of mission-critical services.
  • Create a job alert for this search

    Site Reliability Engineer • Pune, Maharashtra, India