Talent.com
This job offer is not available in your country.
Senior Site Reliability Engineer

Senior Site Reliability Engineer

Wits Innovation LabMohali
30+ days ago
Job description

Job Overview :

The Sr. SRE will lead the implementation and management of the observability stack across cloud infrastructure, ensuring reliability, scalability, performance, and cost-efficiency. The role spans across Kubernetes, AWS, automation, incident response, and platform reliability.

Key Responsibilities :

  • Build and maintain monitoring, logging, and alerting solutions.
  • Lead incident response & post-mortem best practices.
  • Design & test disaster recovery strategies.
  • Collaborate with dev teams to define SLAs.
  • Optimize cloud infra (AWS) for cost and performance.
  • Automate deployments, scaling & recovery using Terraform, GitLab CI / CD, Kubernetes.
  • Handle on-call support.

Required Skills & Experience

  • 4+ years in SRE / DevOps.
  • Proficiency in Shell, Chef, Ansible, Python.
  • Strong AWS services experience (EC2, EKS, RDS, CloudWatch, Cognito, etc.).
  • Kubernetes administration in production.
  • IaC : Terraform / CloudFormation.
  • Observability tools : Prometheus, Grafana, ELK, tracing systems.
  • PostgreSQL (including replication).
  • Networking, load balancing, security best practices.
  • CI / CD pipelines & GitOps workflows.
  • Ability to handle high-pressure incidents.
  • Exposure to Splunk, Datadog, Dynatrace (plus point).
  • Preferred :

  • AWS Certified Solutions Architect / DevOps Engineer.
  • Certified Kubernetes Administrator (CKA).
  • (ref : hirist.tech)

    Create a job alert for this search

    Senior Site Reliability Engineer • Mohali