Lead SRE and DevOps initiatives, supporting development teams with CI / CD, automation, and infrastructure design across Azure environments.
Maintain Infrastructure as Code (IaC) standards; automate key rotation, backups, and configuration drift detection using pipelines.
Design and execute SRE projects including API versioning, security tool integrations (Human, Cequence, CrowdStrike), and sandbox-to-production alignment.
Drive observability and performance engineering—tuning alerts, dashboards, and executing performance and chaos testing for peak events and releases.
Oversee cost optimization and FinOps initiatives (FinOps 3.0), detecting anomalies and right-sizing Azure services for efficiency.
Manage scalability, availability, and DR readiness through auto-scaling, quarterly failover drills, and continuous capacity planning.
Provide on-call and incident management leadership, ensuring timely triage, RCA, and closure of blameless post-mortem action items.
Partner cross-functionally to document architectures, coordinate KTLO operations, and evaluate new technologies and AI adoption for reliability improvement.
Site Reliability Engineer • Hyderabad, Telangana, India