Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

CodeKarmaanand, gujarat, in
13 hours ago
Job description

Site Reliability Engineer (Multi-Cloud Deployments)

Location : Bangalore / Remote

Experience : 4–10 years

Type : Full-time (6-month probation)

About CodeKarma

CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s workflow.

Our platform runs both as SaaS and as sub-account / on-prem deployments within our customers’ cloud environments.

We’re looking for engineers who can take ownership of these deployments end-to-end — from setup to monitoring, upgrades, and ongoing reliability.

What You’ll Do

You’ll be responsible for managing CodeKarma’s distributed deployments across client environments — ensuring reliability, security, and performance at scale.

  • Deploy and manage CodeKarma clusters across AWS, GCP, and Azure customer sub-accounts.
  • Monitor, upgrade, and maintain Kubernetes clusters and related infrastructure.
  • Implement observability, alerting, and disaster recovery for each deployment.
  • Handle CI / CD automation for platform releases, patches, and version upgrades.
  • Work closely with client engineering teams to adapt deployments to their environments, policies, and security constraints.
  • Diagnose and resolve environment-specific issues across networking, storage, and configuration layers.
  • Build and maintain infrastructure playbooks, Helm charts, and Terraform modules for standardized deployment.

What We’re Looking For

  • Strong experience managing Kubernetes clusters (EKS, GKE, AKS, or on-prem equivalents).
  • Deep understanding of Kubernetes internals, Helm, ingress controllers, networking, and storage classes .
  • Hands-on experience with CI / CD tools (GitHub Actions, ArgoCD, or similar).
  • Familiarity with monitoring and alerting stacks (Prometheus, Grafana, Loki, ELK, etc.).
  • Working knowledge of cloud infrastructure across AWS / GCP / Azure.
  • Ability to work directly with client engineering and DevOps teams , understanding their constraints and helping them integrate CodeKarma.
  • Strong debugging and communication skills — you’ll often be the bridge between CodeKarma and client infrastructure.
  • Why Join Us

  • Manage real, large-scale production environments across multiple enterprises.
  • Work directly with founders and senior engineers to shape how CodeKarma scales across clients.
  • High ownership, fast-moving environment, and exposure to deep-tech systems.
  • How to Apply

    Please share :

  • A short summary of your Kubernetes experience (cluster management, scaling, debugging, etc.).
  • Any automation or deployment tooling you’ve built or maintained.
  • Links to your GitHub / GitLab / blog posts (if available).
  • Create a job alert for this search

    Site Reliability Engineer • anand, gujarat, in