Talent.com
Site Reliability Engineer

Site Reliability Engineer

CodeKarmaKanpur, Uttar Pradesh, India
10 days ago
Job description

Site Reliability Engineer (Multi-Cloud Deployments) Location : Bangalore / Remote

Experience : 4–10 years

Type : Full-time (6-month probation)

About CodeKarma CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s workflow.

Our platform runs both as SaaS and as sub-account / on-prem deployments within our customers’ cloud environments.

We’re looking for engineers who can take ownership of these deployments end-to-end — from setup to monitoring, upgrades, and ongoing reliability.

What You’ll Do You’ll be responsible for managing CodeKarma’s distributed deployments across client environments — ensuring reliability, security, and performance at scale.

Deploy and manage CodeKarma clusters across AWS, GCP, and Azure customer sub-accounts.

Monitor, upgrade, and maintain Kubernetes clusters and related infrastructure.

Implement observability, alerting, and disaster recovery for each deployment.

Handle CI / CD automation for platform releases, patches, and version upgrades.

Work closely with client engineering teams to adapt deployments to their environments, policies, and security constraints.

Diagnose and resolve environment-specific issues across networking, storage, and configuration layers.

Build and maintain infrastructure playbooks, Helm charts, and Terraform modules for standardized deployment.

What We’re Looking For Strong experience managing Kubernetes clusters (EKS, GKE, AKS, or on-prem equivalents).

Deep understanding of Kubernetes internals, Helm, ingress controllers, networking, and storage classes .

Hands-on experience with CI / CD tools (GitHub Actions, ArgoCD, or similar).

Familiarity with monitoring and alerting stacks (Prometheus, Grafana, Loki, ELK, etc.).

Working knowledge of cloud infrastructure across AWS / GCP / Azure.

Ability to work directly with client engineering and DevOps teams , understanding their constraints and helping them integrate CodeKarma.

Strong debugging and communication skills — you’ll often be the bridge between CodeKarma and client infrastructure.

Why Join Us Manage real, large-scale production environments across multiple enterprises.

Work directly with founders and senior engineers to shape how CodeKarma scales across clients.

High ownership, fast-moving environment, and exposure to deep-tech systems.

How to Apply Please share :

A short summary of your Kubernetes experience (cluster management, scaling, debugging, etc.).

Any automation or deployment tooling you’ve built or maintained.

Links to your GitHub / GitLab / blog posts (if available).

Create a job alert for this search

Site Reliability Engineer • Kanpur, Uttar Pradesh, India