Greetings from Peoplefy!
We’re looking for an SRE who can
own reliability for mission-critical services on Azure , shape standards, lead incidents with calm clarity, and drive engineering excellence across teams
Experience : 10+ years
Location : Trivandrum
Responsibilities : Strong
site reliability experience
Previously worked as
DevOps
engineer and at present working as
SRE
Strong experience in
Azure
Strong experience with
AKS
Experience working in
docker
Experience with
observability
(Any tool)
Experience working on
PostgreSQL
SLIs / SLOs & Error Budgets
Define SLIs / SLOs for Tier-0 / Tier-1 services & review quarterly
Implement
multi-window, multi-burn-rate alerts
Change gating via CI / CD based on error budgets
Maintain
Azure Monitor / Grafana / Prometheus / App Insights dashboards
Conduct weekly SLO reviews & drive reliability roadmap
Incident Management
Lead
SEV1 / SEV2 incidents , own communication & postmortems
Ensure corrective actions are implemented
Reliability Engineering
Implement
DR, multi-AZ / region patterns, HPA / VPA / KEDA, resilient rollouts
Cluster hardening
(network, identity, policy), optimize density
Ingress : AGIC / Nginx
Observability
Metrics, traces, logs via
Azure Monitor, App Insights, Log Analytics, Prometheus, Grafana, OpenTelemetry
Alerts
on symptoms, not noise
Automation & IaC
Terraform / Bicep ,
GitOps (Flux / Argo) ,
Azure Policy / OPA Gatekeeper
Automate toil & build self-service runbooks / chatops
CI / CD Reliability
Azure DevOps / GitHub Actions with canary, blue-green, rollback
Key Vault-backed secrets
Performance & Capacity
Load testing, autoscaling, FinOps collaboration
Disaster Recovery
Define
RTO / RPO , run
chaos drills & game days
Security
Entra ID, Key Vault rotation, VNets / NSGs, shift-left security in CI
Documentation
Runbooks, SLOs, postmortems, architectures — kept
current & accessible
Interested candidates please share your updated resumes on
amruta.bu@peoplefy.com
Senior Site Reliability Engineer • Delhi, Delhi, India