Talent.com
This job offer is not available in your country.
Site Reliability Engineer II

Site Reliability Engineer II

ZafinTrivandrum, India
25 days ago
Job description

Senior Site Reliability Engineer (SRE II)

Own availability, latency, performance, and efficiency for Zafin’s SaaS on Azure. You’ll define and enforce reliability standards, lead high-impact projects, mentor engineers, and eliminate toil at scale. Reports to the Director of SRE.

What you’ll do

SLIs / SLOs & contracts : Define customer-centric SLIs / SLOs for Tier-0 / Tier-1 services. Publish, review quarterly, and align teams to them.

Error budgeting (policy & tooling) :

Run the error-budget policy with multi-window, multi-burn-rate alerts; clear runbooks and paging thresholds.

Gate changes by budget status (freeze / relax rules) wired into CI / CD.

Maintain SLO / EB dashboards (Azure Monitor, Grafana / Prometheus, App Insights). Run weekly SLO reviews with engineering / product.

Drive roadmap tradeoffs when budgets are at risk; land reliability epics.

Incidents without drama : Lead SEV1 / SEV2, own comms, run blameless postmortems, and make corrective actions stick.

Engineer reliability in : Multi-AZ / region patterns (active-active / DR), PDBs / Pod Topology Spread, HPA / VPA / KEDA, resilient rollout / rollback.

AKS at scale : Harden clusters (network, identity, policy), optimize node / pod density, ingress (AGIC / Nginx); mesh optional.

Observability that works : Metrics / traces / logs with Azure Monitor / App Insights, Log Analytics, Prometheus / Grafana, OpenTelemetry. Alert on symptoms, not noise.

IaC & policy : Terraform / Bicep modules, GitOps (Flux / Argo), policy-as-code (Azure Policy / OPA Gatekeeper). No snowflakes.

CI / CD reliability : Azure DevOps / GitHub Actions with canary / blue-green, progressive delivery, auto-rollback, Key Vault-backed secrets.

Capacity & performance : Load testing, right-sizing, autoscaling; partner with FinOps to reduce spend without hurting SLOs.

DR you can trust : Define RTO / RPO, test backups / restore, run game days / chaos drills, validate ASR and multi-region failover.

Secure by default : Entra ID (Azure AD), managed identities, Key Vault rotation, VNets / NSGs / Private Link, shift-left checks in CI.

Reduce toil : Automate recurring ops, build self-service runbooks / chatops, publish golden paths for product teams.

Customer escalations : Be the technical owner on calls; communicate tradeoffs and recovery plans with authority.

Document to scale : Architectures, runbooks, postmortems, SLIs / SLOs—kept current and discoverable.

(If applicable) Streaming / ETL reliability : Apply SRE practices (SLOs, backpressure, idempotency, replay) to NiFi / Flink / Kafka / Redpanda data flows.

Minimum qualifications

Bachelor’s in CS / Engineering (or equivalent experience).

12+ years in production ops / platform / SRE, including 5+ years on Azure .

PostgreSQL (must-have) : Deep operational expertise incl. HA / DR, logical / physical replication, performance tuning (indexes / EXPLAIN / ANALYZE, pg_stat_statements), autovacuum strategy, partitioning, backup / restore testing, and connection pooling (pgBouncer). Prefer experience with Azure Database for PostgreSQL – Flexible Server .

Azure core : AKS (must-have) ; Front Door / App Gateway, API Management, VNets / NSGs / Private Link, Storage, Key Vault, Redis, Service Bus / Event Hubs.

Observability : Azure Monitor / App Insights, Log Analytics, Prometheus / Grafana; SLO design and error-budget operations.

IaC / automation : Terraform and / or Bicep; PowerShell and Python; GitOps (Flux / Argo). Pipelines in Azure DevOps or GitHub Actions.

Proven incident leadership at scale, blameless postmortems, and SLO / error-budget governance with change gating.

Mentorship and crisp written / verbal communication.

Preferred (nice to have)

Apache NiFi , Apache Flink , Apache Kafka or Redpanda (self-managed on AKS or managed equivalents); schema management, exactly-once semantics, backpressure, dead-letter / replay patterns.

Azure Solutions Architect Expert , CKA / CKAD.

ITSM (ServiceNow), on-call tooling (PagerDuty / Opsgenie).

Compliance / SecOps (SOC 2, ISO 27001), policy-as-code, workload identity.

OpenTelemetry, eBPF tooling, or service mesh.

Multi-tenant SaaS and cost optimization at scale.

Create a job alert for this search

Site Reliability Engineer • Trivandrum, India

Related jobs
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

o9 Solutions, Inc.kollam, kerala, in
Be part of something revolutionary.At o9 Solutions, our mission is clear : be the Most Valuable Platform (MVP) for enterprises. With our AI-driven platform — the o9 Digital Brain — we integrate globa...Show moreLast updated: 4 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

CodeKarmakollam, kerala, in
Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 3 days ago
  • Promoted
Equifax - Senior Site Reliability Engineer - IAC Terraform

Equifax - Senior Site Reliability Engineer - IAC Terraform

EquifaxTrivandrum
About the job Site Reliability Engineering (SRE) at Equifax is a discipline that combines software and systems engineering for building and running large-scale, distr...Show moreLast updated: 29 days ago
  • Promoted
Deployment Engineer

Deployment Engineer

AvocaKollam, IN
Build, launch & optimize AI agents that power the next generation of home-service customer experiences.Avoca is the all-in-one AI lead-conversion platform. Our technology boosts booking rates, slash...Show moreLast updated: 30+ days ago
  • Promoted
Equifax - Site Reliability Engineer

Equifax - Site Reliability Engineer

EquifaxThiruvananthapuram
Site Reliability Engineering (SRE) at Equifax SRE is a discipline that combines software and systems engineering for building and running large-scale, distrib...Show moreLast updated: 30+ days ago
  • Promoted
DevOps / Platform Engineer

DevOps / Platform Engineer

iVedha Inc.Kollam, IN
Hiring a seasoned DevOps / Platform Engineer to drive automation, platform reliability, and robust.Design, deploy, and manage CI / CD pipelines and infrastructure automation, leveraging AI for.Implemen...Show moreLast updated: 30+ days ago
  • Promoted
Senior Site Reliability Engineer- ELK Expert

Senior Site Reliability Engineer- ELK Expert

iVedha Inc.Kollam, IN
Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
  • Promoted
Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

ConfidentialThiruvananthapuram / Trivandrum
As a Site Reliability Engineer (SRE) you will be responsible for improving the overall reliability of applications by ensuring its availability, performance, and scalability.Should be able to gathe...Show moreLast updated: 24 days ago
  • Promoted
System Engineer

System Engineer

Next VenturesKollam, IN
Offshore Systems Engineer – VMware & Azure.We’re seeking a highly skilled.This role is ideal for someone who thrives in dynamic environments, stays ahead of emerging tech trends, and can drive inno...Show moreLast updated: 24 days ago
  • Promoted
Resident Engineer – Kubernetes & Portworx

Resident Engineer – Kubernetes & Portworx

CMK Resources, Inc.Kollam, IN
CMK Resources Resident Engineer – Kubernetes & Portworx (3 openings).Help Shape the Future of Kubernetes Storage.Our client's largest and most strategic customer is moving VMware-based workloads to...Show moreLast updated: 27 days ago
  • Promoted
Senior MLOps Engineer

Senior MLOps Engineer

Mitchell Martin Inc.Kollam, IN
Include, but are not limited to, the following : .Own productionizing models—from tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollba...Show moreLast updated: 30+ days ago
  • Promoted
Lead Sustenance Engineer - Storage

Lead Sustenance Engineer - Storage

DDNKollam, IN
This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a globa...Show moreLast updated: 27 days ago
  • Promoted
Imperva Engineer

Imperva Engineer

Insight GlobalKollam, IN
Large Telecommunications Client.This role requires hands-on experience with.The engineer will be instrumental in implementing system changes during off-hours to reduce load on the onshore team.Conf...Show moreLast updated: 30+ days ago
  • Promoted
MLops Engineer

MLops Engineer

RecroThiruvananthapuram, IN
We are looking for an experienced.Azure and AWS cloud ecosystems.The ideal candidate should bring a strong background in. GenAI tooling, automation, and CI / CD pipelines.Design, implement, and manage...Show moreLast updated: 4 days ago
  • Promoted
Senior DevOps / Site Reliability Engineer

Senior DevOps / Site Reliability Engineer

Scoop Technologies Pvt LtdTrivandrum
Job Title : Senior DevOps Engineer / Site Reliability Engineer (SRE) Experience : 5 to 8 Years &...Show moreLast updated: 30+ days ago
  • Promoted
Rotating Equipment Reliability Consultant / Trainer

Rotating Equipment Reliability Consultant / Trainer

EC-Energy EventsThiruvananthapuram, IN
EC-Energy Events is looking for an experienced Rotating Equipment Reliability Consultant / Trainer to join our growing pool of experts supporting technical conferences, training programs, and worksho...Show moreLast updated: 14 days ago
  • Promoted
Remote IT Network Site Survey Lead

Remote IT Network Site Survey Lead

Nextbridge IT SolutionsKollam, IN
Remote
Network Site Survey Engineer will lead the execution and standardization of comprehensive network (IT) site surveys across the client’s facilities. This role ensures that each assessment, covering c...Show moreLast updated: 4 days ago
  • Promoted
MLOps Engineer

MLOps Engineer

X4 TechnologyThiruvananthapuram, IN
MLOps Engineer - Role & Responsibilities.Design, deploy and manage scalable & secure cloud infrastructure.Apply least privilege across cloud platforms (Azure, RBAC, AWS IAM).Enable audit logging co...Show moreLast updated: 4 days ago
  • Promoted
Site Reliability Engineer

Site Reliability Engineer

Amicon Hub Serviceskollam, kerala, in
Manage and scale production systems hosted on.Automate operational tasks using.Improve system reliability and reduce manual interventions through automation. Collaborate with development teams to en...Show moreLast updated: 25 days ago
  • Promoted
Vulnerability Management / DevSecOps Engineer – 3+ Years | Trivandrum | Immediate Joiner

Vulnerability Management / DevSecOps Engineer – 3+ Years | Trivandrum | Immediate Joiner

USTThiruvananthapuram, Kerala, India
CCTC | ECTC | Notice Period | Location Preference.Act fast for immediate attention! ⏳📩.Rapid7 InsightVM, CrowdStrike, Nexus. DevOps, SysAdmins, and Developers.Grafana, Splunk, Jira) for tracking an...Show moreLast updated: 14 days ago