Site Reliability Engineer (SRE) Senior Role
Location : Mohali
Experience : 4+ years
We are looking for an experienced Site Reliability Engineer (SRE) to strengthen our cloud and infrastructure team. The role involves owning reliability, availability, and scalability of distributed platforms, while driving automation and observability best practices.
Key Responsibilities :
- Build, implement, and maintain monitoring, logging, and alerting systems across production and non-production environments.
- Lead incident management, root cause analysis, and drive improvements for faster recovery.
- Define and test disaster recovery and backup strategies.
- Partner with development and product teams to establish and enforce SLAs, SLOs, and SLIs.
- Optimize AWS cloud environments for performance, resilience, and cost-efficiency.
- Develop automation for provisioning, scaling, deployment, and recovery.
- Manage infrastructure using Terraform, GitLab CI / CD, Kubernetes, and related tooling.
- Participate in on-call rotations and incident handling.
Skills & Experience Required :
4+ years in SRE, DevOps, or cloud infrastructure roles.Hands-on with AWS services (EC2, EKS, RDS, Cognito, CloudWatch).Proficient in Kubernetes administration in production environments.Experience with Infrastructure as Code (Terraform, CloudFormation).Scripting in Python, Bash, or Shell.Familiarity with automation tools (Chef, Ansible).Strong observability background : Prometheus, Grafana, ELK, distributed tracing.Experience managing relational databases (PostgreSQL or similar, with replication).Solid understanding of networking, load balancing, and security practices.CI / CD exposure (GitOps, pipelines).Worked with tools like Splunk, Datadog, or Dynatrace.Preferred Qualifications :
AWS Certified Solutions Architect / DevOps Engineer.Certified Kubernetes Administrator (CKA).(ref : hirist.tech)