Talent.com
Senior Site Reliability Engineer - Azure Kubernetes Service
Senior Site Reliability Engineer - Azure Kubernetes ServicePeoplefy • Trivandrum
Senior Site Reliability Engineer - Azure Kubernetes Service

Senior Site Reliability Engineer - Azure Kubernetes Service

Peoplefy • Trivandrum
7 days ago
Job description

Description :

Site Reliability Engineer (SRE) - Azure / AKS Lead

Role Overview :

This is a senior technical leadership role for a Site Reliability Engineer (SRE) requiring 10+ years of experience, focused on owning and driving reliability for mission-critical, high-scale services deployed on Microsoft Azure.

The role demands prior experience as a DevOps Engineer transitioning into a dedicated SRE function. The incumbent must possess expert knowledge in Azure, AKS (Azure Kubernetes Service), and modern reliability practices including defining and enforcing SLIs / SLOs.

Based in Trivandrum, this SRE will shape technical standards, lead major incident response, and champion engineering excellence across multiple development teams.

Job Summary :

We are seeking an experienced SRE Lead (10+ years) with strong background in Azure and AKS to ensure the highest levels of availability, performance, and scalability for our Tier-0 / Tier-1 services.

This role is responsible for establishing and maintaining core SRE practices, including defining error budgets, implementing multi-burn-rate alerting, driving continuous automation (Terraform / GitOps), and leading critical incident response with calm clarity. Expertise in observability, disaster recovery design (RTO / RPO), and cluster hardening is mandatory.

Key Responsibilities and Reliability Engineering Deliverables :

  • Service Level Management : Define SLIs / SLOs for Tier-0 / Tier-1 services and conduct quarterly reviews. Implement multi-window, multi-burn-rate alerts to precisely detect evolving service degradation.
  • Error Budgeting and Change Gating : Enforce reliability constraints by implementing Change gating via CI / CD based on error budgets (using tools like Azure DevOps / GitHub Actions). Conduct weekly SLO reviews & drive the reliability roadmap.
  • Incident Management Command : Lead SEV1 / SEV2 incidents as the Incident Commander, taking ownership of rapid resolution, clear communication & postmortems. Ensure all corrective actions are implemented effectively.
  • Reliability Architecture & Kubernetes : Design and implement robust reliability patterns including DR (Disaster Recovery), multi-AZ / region configurations, HPA / VPA / KEDA for optimized scaling, and resilient deployment strategies like canary, blue-green, and rollback.
  • Cluster Hardening & Optimization : Drive Cluster hardening initiatives (network, identity, policy). Optimize resource utilization and service density. Manage ingress traffic using AGIC / Nginx.
  • Observability Implementation : Implement comprehensive observability solutions utilizing Metrics, traces, and logs via Azure Monitor, App Insights, Log Analytics, Prometheus, Grafana, and OpenTelemetry. Ensure Alerts on symptoms, not noise.
  • Automation and Infrastructure as Code (IaC) : Automate platform provisioning using Terraform / Bicep. Implement GitOps (Flux / Argo) principles for deployment management and enforce compliance using Azure Policy / OPA Gatekeeper. Automate toil & build self-service runbooks / chatops.
  • Performance & Capacity Planning : Conduct rigorous Load testing. Optimize platform autoscaling strategies and collaborate with FinOps to optimize cloud cost.
  • Disaster Recovery and Testing : Define RTO / RPO objectives. Ensure compliance by executing regular chaos drills & game days to validate resilience.
  • Security and Governance : Implement Security best practices leveraging Entra ID (Azure AD), Key Vault rotation, VNets / NSGs, and driving shift-left security practices within the CI pipeline.

Mandatory Skills & Qualifications :

  • Experience : 10+ years of professional experience in Site Reliability or DevOps. Must have previously worked as a DevOps engineer and at present working as SRE.
  • Cloud Platform : Strong experience in Azure.
  • Container Orchestration : Strong experience with AKS (Azure Kubernetes Service) and Experience working in docker.
  • Database : Experience working on PostgreSQL (or similar enterprise-grade databases).
  • Observability : Strong experience with observability practices and tools (e.g., Azure Monitor, Grafana, Prometheus, App Insights).
  • IaC & Automation : Hands-on expertise with Terraform / Bicep and GitOps principles.
  • Preferred Skills :

  • Deep familiarity with Entra ID, Azure Policy, and Key Vault security integration.
  • Experience implementing OpenTelemetry standards for distributed tracing.
  • Certifications related to Azure or Kubernetes (e.g., Azure Administrator, CKA / CKAD).
  • (ref : hirist.tech)

    Create a job alert for this search

    Senior Site Reliability Engineer • Trivandrum

    Related jobs
    Lead Engineer

    Lead Engineer

    Hyqoo • Thiruvananthapuram, IN
    Design, deploy, and manage AWS cloud infrastructure, including EC2 instances, S3 buckets, VPCs, RDS databases, and Lambda functions. Assist in the design, implementation, and maintenance of backup, ...Show more
    Last updated: 16 days ago • Promoted
    Azure Kubernetes Service (AKS) Architect

    Azure Kubernetes Service (AKS) Architect

    Capgemini • Thiruvananthapuram, IN
    Azure Kubernetes Service (AKS) clusters.AKS cluster security, scalability, and performance optimization.AKS with CI / CD pipelines for automated deployments. RBAC, secrets management, and compliance s...Show more
    Last updated: 5 days ago • Promoted
    Senior IoT Full Stack Engineer

    Senior IoT Full Stack Engineer

    IntraEdge • thiruvananthapuram, kerala, in
    We’re rebuilding a legacy IoT monolith into a modern microservices-based platform on Azure.Looking for a hands-on IoT engineer who can own development across cloud and edge services.This role focus...Show more
    Last updated: 21 days ago • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    o9 Solutions, Inc. • thiruvananthapuram, kerala, in
    Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show more
    Last updated: 12 days ago • Promoted
    Senior DevOps Engineer

    Senior DevOps Engineer

    ACL Digital • thiruvananthapuram, kerala, in
    DevOps, Cloud Engineering, or related roles.Strong experience with CI / CD tools (Jenkins, GitLab CI, GitHub Actions, or Azure DevOps). Hands-on experience with Docker and Kubernetes (deployment, scal...Show more
    Last updated: 21 days ago • Promoted
    DevOps Engineer

    DevOps Engineer

    Innover Digital • thiruvananthapuram, kerala, in
    We are looking for a skilled DevOps Engineer to support the migration of Azure DevOps CI / CD pipelines to GitHub Actions. The role will involve implementing and optimizing GitHub-based workflows and ...Show more
    Last updated: 15 days ago • Promoted
    Site Reliability Engineer (SRE) – Infrastructure & Automation

    Site Reliability Engineer (SRE) – Infrastructure & Automation

    InstaService • Thiruvananthapuram, IN
    InstaService is revolutionizing the home services industry through AI-driven technology, connecting customers with trusted professionals instantly. We’re growing fast across 23+ states and expanding...Show more
    Last updated: 19 days ago • Promoted
    Senior Cloud IAM Engineer (AWS / Okta)

    Senior Cloud IAM Engineer (AWS / Okta)

    Vertex Agility • Thiruvananthapuram, IN
    Senior Cloud IAM Engineer (AWS / Okta) – Remote.Vertex Agility | Agile On-Demand Solutions.Are you passionate about identity management and cloud security? Vertex Agility is looking for a Senior Cl...Show more
    Last updated: 22 days ago • Promoted
    Senior DevOps Engineer (Azure Focus) – IIT Graduates only

    Senior DevOps Engineer (Azure Focus) – IIT Graduates only

    ezAIx Inc. • thiruvananthapuram, kerala, in
    As part of our expansion, we’re seeking an.Advanced Senior DevOps Engineer.This critical role will drive the scalability of our platform, ensuring robust performance and high availability while emb...Show more
    Last updated: 3 days ago • Promoted
    Senior Site Reliability Engineer (C# / Python)

    Senior Site Reliability Engineer (C# / Python)

    Entech • Thiruvananthapuram, IN
    Senior Software Site Reliability Engineer (C# / Python).You’ll ensure enterprise systems are reliable, scalable, and performant - driving improvements, leading SRE initiatives, and mentoring teams on...Show more
    Last updated: 6 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Synamedia • thiruvananthapuram, kerala, in
    At Synamedia, the world’s most talented innovators and trailblazers are shaping the way the world is entertained and informed. We are backed by the Permira funds and Sky.This is the age of infinite ...Show more
    Last updated: 15 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    HRhelpdesk • thiruvananthapuram, kerala, in
    Company is a rapidly growing, private equity backed SaaS product company and provides cloud-based solutions.As a Site Reliability Engineer (SRE), you will be responsible for building and maintainin...Show more
    Last updated: 11 days ago • Promoted
    Senior DevOps & Database Reliability Engineer – 100% Remote

    Senior DevOps & Database Reliability Engineer – 100% Remote

    Hyly.AI • Thiruvananthapuram, IN
    Remote
    AI, we’re building the first AI + Data Fabric for the multifamily industry, transforming how clients manage, secure, and scale their marketing and operational data. As the industry moves toward a co...Show more
    Last updated: 13 days ago • Promoted
    Senior Kubernetes Network Engineer

    Senior Kubernetes Network Engineer

    World Wide Technology • Thiruvananthapuram, IN
    World Wide Technology Holding Co, LLC (WWT).Through our culture of innovation, we inspire, build and deliver business results, from idea to outcome. Louis, WWT works closely with industry leaders su...Show more
    Last updated: 22 days ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CareStack - Dental Practice Management • thiruvananthapuram, India
    Manage and maintain day-to-day BAU operations, including monitoring system.Build infrastructure as code (IAC) patterns that meet security and engineering. Build CI / CD pipelines using Octopus, GitLab...Show more
    Last updated: 1 day ago • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Yum! India Global Services Private Limited • thiruvananthapuram, kerala, in
    Design, test, implement, deploy, and support continuous integration pipelines that build and deploy to cloud-based environments (development, stage / testing, production). In this role, you will help ...Show more
    Last updated: 30+ days ago • Promoted
    Senior DevOps Engineer

    Senior DevOps Engineer

    Vidhema Technologies • thiruvananthapuram, kerala, in
    Notice Period : Immediate Joiners Preferred.Cloud Platforms : Proficiency in AWS’s compute, storage, and networking services. Infrastructure as Code (IaC) and Configuration Management : .IaC Tools : Expe...Show more
    Last updated: 19 days ago • Promoted
    Senior DevOps Engineer (SRE)

    Senior DevOps Engineer (SRE)

    MightyBot • thiruvananthapuram, kerala, in
    Title : Senior DevOps Engineer (SRE).Join our team as a Senior DevOps Engineer, where we're focused on graduating AI from interesting demos to indispensable products. You will build and maintain the ...Show more
    Last updated: 12 days ago • Promoted