Talent.com
This job offer is not available in your country.
▷ (06 / 10 / 2025) Site Reliability Engineer II

▷ (06 / 10 / 2025) Site Reliability Engineer II

ZafinIndia
5 hours ago
Job description

Senior Site Reliability Engineer (SRE II)

Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.

What you’ll do

  • SLIs / SLOs & contracts : Define customer-centric SLIs / SLOs for Tier-0 / Tier-1 services. Publish, review quarterly, and align teams to them.
  • Error budgeting (policy & tooling) :
  • Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.
  • Gate changes by budget status (freeze / relax rules) wired into CI / CD.
  • Maintain SLO / EB dashboards (Azure Monitor, Grafana / Prometheus, App Insights). Run weekly SLO reviews with engineering / product.
  • Drive roadmap tradeoffs when budgets are at risk; land reliability epics.
  • Incidents without drama : Lead SEV1 / SEV2, own comms, run blameless postmortems, and make corrective actions stick.
  • Engineer reliability in : Multi-AZ / region patterns (active-active / DR), PDBs / Pod Topology Spread, HPA / VPA / KEDA, resilient rollout / rollback.
  • AKS at scale : Harden clusters (network, identity, policy), optimize node / pod density, ingress (AGIC / Nginx); mesh optional.
  • Observability that works : Metrics / traces / logs with Azure Monitor / App Insights, Log Analytics, Prometheus / Grafana, OpenTelemetry. Alert on symptoms, not noise.
  • IaC & policy : Terraform / Bicep modules, GitOps (Flux / Argo), policy-as-code (Azure Policy / OPA Gatekeeper). No snowflakes.
  • CI / CD reliability : Azure DevOps / GitHub Actions with canary / blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.
  • Capacity & performance : Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.
  • DR you can trust : Define RTO / RPO, test backups / restore, run game days / chaos drills, validate ASR and multi-region failover.
  • Secure by default : Entra ID (Azure AD), managed identities, Key Vault rotation, VNets / NSGs / Private Link, shift-left checks in CI.
  • Reduce toil : Automate recurring ops, build self-service runbooks / chatops, publish golden paths for product teams.
  • Customer escalations : Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.
  • Document to scale : Architectures, runbooks, postmortems, SLIs / SLOs—kept current and discoverable.
  • (If applicable) Streaming / ETL reliability : Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi / Flink / Kafka / Redpanda data flows.

Minimum qualifications

  • Bachelor’s in CS / Engineering (or equivalent experience).
  • 12+ years in production ops / platform / SRE, including 5+ years on Azure.
  • PostgreSQL (must-have) : Deep operational expertise incl. HA / DR, logical / physical replication, performance tuning (indexes / EXPLAIN / ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup / restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server.
  • Azure core : AKS (must-have); Front Door / App Gateway, API Management, VNets / NSGs / Private Link, Storage, Key Vault, Redis, Service Bus / Event Hubs.
  • Observability : Azure Monitor / App Insights, Log Analytics, Prometheus / Grafana; SLO design and error-budget operations.
  • IaC / automation : Terraform and / or Bicep; PowerShell and Python; GitOps (Flux / Argo). Pipelines in Azure DevOps or GitHub Actions.
  • Proven incident leadership at scale, blameless postmortems, and SLO / error-budget governance with change gating.
  • Mentorship and crisp written / verbal communication.
  • Preferred (nice to have)

  • Apache NiFi, Apache Flink, Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter / replay patterns.
  • Azure Solutions Architect Expert, CKA / CKAD.
  • ITSM (ServiceNow), on-call tooling (PagerDuty / Opsgenie).
  • Compliance / SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.
  • OpenTelemetry, eBPF tooling, or service mesh.
  • Multi-tenant SaaS and cost optimization at scale.
  • Create a job alert for this search

    Site Reliability Engineer • India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Nagpur, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    SapaadNagpur, IN
    Our flagship product, also named Sapaad, has achieved remarkable success over the past decade, empowering.F&B businesses across 40+ countries. Driven by a passionate team of developers, designers, a...Show moreLast updated: 7 hours ago
    • Promoted
    AWS Site Reliability Engineer

    AWS Site Reliability Engineer

    HTC Global ServicesIndia
    HTC – A brief profile Established in 1990, HTC Inc.Troy, Michigan, is a leading global Information Technology solution and BPO provider. HTC assists clients across multiple industry verticals, offer...Show moreLast updated: 1 day ago
    • Promoted
    Sr Site Reliability Engineer

    Sr Site Reliability Engineer

    Media.netIndia
    Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms. HQ is based in New York, and the Global H...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ViewSonicIndia
    Job Requirements : Bachelor's degree in Computer Science, Engineering, or a related field.Site Reliability Engineer, DevOps Engineer, or similar, is preferred but not mandatory.Basic understanding o...Show moreLast updated: 23 days ago
    • Promoted
    Sr Engineer, Site Reliability Engineer [T500-20464]

    Sr Engineer, Site Reliability Engineer [T500-20464]

    ANSRIndia
    ANSR is hiring for one of its clients.About T-Mobile : T-Mobile US, Inc.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its st...Show moreLast updated: 13 days ago
    • Promoted
    Principal Site Reliability Engineer

    Principal Site Reliability Engineer

    Rakuten IndiaIndia
    Responsibilities : Design, develop SLA, SLO, SLI of services within the Business Unit.Involve in whole process of Development, Production System Operation including system maintenance, monitoring, a...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Teksands.aiIndia
    Experience in One Identity tool (preferred) operations or similar IAM tools.Devops Engineer with expertise in Kubernetes, Docker, Azure, AWS, Deployment Vmware •. Knowledge in DevOps tools of Github...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    TalentiserIndia
    Hiring hybrid Site Reliability Engineers for a.Our SaaS platform is designed for high performance, reliability, and automation at scale. Site Reliability Engineer , you’ll play a key role in ensurin...Show moreLast updated: 21 hours ago
    • Promoted
    • New!
    ▷ (06 / 10 / 2025) Site Reliability Engineer / Lead

    ▷ (06 / 10 / 2025) Site Reliability Engineer / Lead

    CoforgeIndia
    Skills : Docker, Prometheus, grafana, ELK, DataDog.We at Coforge are hiring a highly skilled and experienced SRE Lead Engineer to drive reliability, scalability, and performance across our infrastru...Show moreLast updated: 1 hour ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Luxoft IndiaIndia
    Project Description : We are looking for an experienced technical developer to work for one of our client from the banking industry. Project goal is to maintain and develop solutions.Responsibilities...Show moreLast updated: 23 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ACL DigitalIndia
    Service Management : Maintain application uptime / performance, manage system enhancements and defects, oversee daily operational activities, and ensure continuous improvement and adherence to ITIL be...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer [T500-20117]

    Senior Site Reliability Engineer [T500-20117]

    Delta Air LinesIndia
    About Delta Tech Hub : Delta Air Lines (NYSE : DAL) is the U.Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our...Show moreLast updated: 26 days ago
    • Promoted
    Site Reliability Engineer II

    Site Reliability Engineer II

    RecRootsIndia
    Key Job Responsibilities and Duties : .The core premise for the SRE lies in treating operational issues as a software problem. We code our way out of problems where operations are concerned addressing...Show moreLast updated: 1 day ago
    • Promoted
    Reliability Engineer and Planning Engineer

    Reliability Engineer and Planning Engineer

    JobTravia Pvt. Ltd.Nagpur, IN
    Reliability / Planning Superintendent.Lead reliability and maintenance planning across the processing plant to ensure safe, efficient, and cost-effective operations. Drive continuous improvement, asse...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ValueMomentumIndia
    About the Role We are seeking an experienced.Site Reliability / Azure DevOps Engineer with Dynatrace Experience.CI / CD practices, infrastructure automation, and cloud operations.The ideal candidate ...Show moreLast updated: 5 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    RecRootsIndia
    The core premise for the SRE lies in treating operational issues as a software problem.We code our way out of problems where operations are concerned, addressing availability, scalability, latency,...Show moreLast updated: 1 day ago
    • Promoted
    Engineer, Site Reliability [T500-20502]

    Engineer, Site Reliability [T500-20502]

    ANSRIndia
    ANSR is hiring for one of its clients.NASDAQ : TMUS), headquartered in Bellevue, Washington, is America’s supercharged Un-carrier, connecting millions through its strong nationwide network and flags...Show moreLast updated: 13 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    QualityKiosk Technologies Pvt. Ltd.India
    QualityKiosk Technologies is one of the world's largest independent Quality Engineering (QE) providers and digital transformation enablers, helping companies build and manage applications for optim...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    TechVeritoIndia
    About the Role : 3-5 years of proven and progressive experience as an.As a SRE Engineer, you will have a strong background in cloud infrastructure management, migration and deployment, with expertis...Show moreLast updated: 5 days ago