This job offer is not available in your country.

Senior Site Reliability Engineer

Wits Innovation LabMohali

30+ days ago

Job description

Job Overview :

The Sr. SRE will lead the implementation and management of the observability stack across cloud infrastructure, ensuring reliability, scalability, performance, and cost-efficiency. The role spans across Kubernetes, AWS, automation, incident response, and platform reliability.

Key Responsibilities :

Build and maintain monitoring, logging, and alerting solutions.
Lead incident response & post-mortem best practices.
Design & test disaster recovery strategies.
Collaborate with dev teams to define SLAs.
Optimize cloud infra (AWS) for cost and performance.
Automate deployments, scaling & recovery using Terraform, GitLab CI / CD, Kubernetes.
Handle on-call support.

Required Skills & Experience

4+ years in SRE / DevOps.

Proficiency in Shell, Chef, Ansible, Python.

Strong AWS services experience (EC2, EKS, RDS, CloudWatch, Cognito, etc.).

Kubernetes administration in production.

IaC : Terraform / CloudFormation.

Observability tools : Prometheus, Grafana, ELK, tracing systems.

PostgreSQL (including replication).

Networking, load balancing, security best practices.

CI / CD pipelines & GitOps workflows.

Ability to handle high-pressure incidents.

Exposure to Splunk, Datadog, Dynatrace (plus point).

Preferred :

AWS Certified Solutions Architect / DevOps Engineer.

Certified Kubernetes Administrator (CKA).

(ref : hirist.tech)

Create a job alert for this search

Senior Site Reliability Engineer • Mohali