Site Reliability Engineer (Multi-Cloud Deployments)
Location : Bangalore / Remote
Experience : 4–10 years
Type :
Full-time (6-month probation)
About CodeKarma
CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s workflow.
Our platform runs both as
SaaS
and as
sub-account / on-prem deployments
within our customers’ cloud environments.
We’re looking for engineers who can take ownership of these deployments end-to-end — from setup to monitoring, upgrades, and ongoing reliability.
What You’ll Do
You’ll be responsible for managing CodeKarma’s distributed deployments across client environments — ensuring reliability, security, and performance at scale.
Deploy and manage CodeKarma clusters
across AWS, GCP, and Azure customer sub-accounts.
Monitor, upgrade, and maintain
Kubernetes clusters and related infrastructure.
Implement
observability, alerting, and disaster recovery
for each deployment.
Handle
CI / CD automation
for platform releases, patches, and version upgrades.
Work closely with
client engineering teams
to adapt deployments to their environments, policies, and security constraints.
Diagnose and resolve environment-specific issues across networking, storage, and configuration layers.
Build and maintain
infrastructure playbooks, Helm charts, and Terraform modules
for standardized deployment.
What We’re Looking For
Strong experience managing
Kubernetes clusters
(EKS, GKE, AKS, or on-prem equivalents).
Deep understanding of
Kubernetes internals, Helm, ingress controllers, networking, and storage classes .
Hands-on experience with
CI / CD tools
(GitHub Actions, ArgoCD, or similar).
Familiarity with
monitoring and alerting stacks
(Prometheus, Grafana, Loki, ELK, etc.).
Working knowledge of
cloud infrastructure
across AWS / GCP / Azure.
Ability to
work directly with client engineering and DevOps teams , understanding their constraints and helping them integrate CodeKarma.
Strong debugging and communication skills — you’ll often be the bridge between CodeKarma and client infrastructure.
Why Join Us
Manage real, large-scale production environments across multiple enterprises.
Work directly with founders and senior engineers to shape how CodeKarma scales across clients.
High ownership, fast-moving environment, and exposure to deep-tech systems.
How to Apply
Please share : A short summary of
your Kubernetes experience
(cluster management, scaling, debugging, etc.).
Any
automation or deployment tooling
you’ve built or maintained.
Links to your
GitHub / GitLab / blog posts
(if available).
Site Reliability Engineer • Delhi, India