Talent.com
Site Reliability Engineer

Site Reliability Engineer

CodeKarmaDelhi, India
12 days ago
Job description

Site Reliability Engineer (Multi-Cloud Deployments)

Location : Bangalore / Remote

Experience : 4–10 years

Type :

Full-time (6-month probation)

About CodeKarma

CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s workflow.

Our platform runs both as

SaaS

and as

sub-account / on-prem deployments

within our customers’ cloud environments.

We’re looking for engineers who can take ownership of these deployments end-to-end — from setup to monitoring, upgrades, and ongoing reliability.

What You’ll Do

You’ll be responsible for managing CodeKarma’s distributed deployments across client environments — ensuring reliability, security, and performance at scale.

Deploy and manage CodeKarma clusters

across AWS, GCP, and Azure customer sub-accounts.

Monitor, upgrade, and maintain

Kubernetes clusters and related infrastructure.

Implement

observability, alerting, and disaster recovery

for each deployment.

Handle

CI / CD automation

for platform releases, patches, and version upgrades.

Work closely with

client engineering teams

to adapt deployments to their environments, policies, and security constraints.

Diagnose and resolve environment-specific issues across networking, storage, and configuration layers.

Build and maintain

infrastructure playbooks, Helm charts, and Terraform modules

for standardized deployment.

What We’re Looking For

Strong experience managing

Kubernetes clusters

(EKS, GKE, AKS, or on-prem equivalents).

Deep understanding of

Kubernetes internals, Helm, ingress controllers, networking, and storage classes .

Hands-on experience with

CI / CD tools

(GitHub Actions, ArgoCD, or similar).

Familiarity with

monitoring and alerting stacks

(Prometheus, Grafana, Loki, ELK, etc.).

Working knowledge of

cloud infrastructure

across AWS / GCP / Azure.

Ability to

work directly with client engineering and DevOps teams , understanding their constraints and helping them integrate CodeKarma.

Strong debugging and communication skills — you’ll often be the bridge between CodeKarma and client infrastructure.

Why Join Us

Manage real, large-scale production environments across multiple enterprises.

Work directly with founders and senior engineers to shape how CodeKarma scales across clients.

High ownership, fast-moving environment, and exposure to deep-tech systems.

How to Apply

Please share : A short summary of

your Kubernetes experience

(cluster management, scaling, debugging, etc.).

Any

automation or deployment tooling

you’ve built or maintained.

Links to your

GitHub / GitLab / blog posts

(if available).

Create a job alert for this search

Site Reliability Engineer • Delhi, India