Talent.com
This job offer is not available in your country.
[Immediate Start] K8S Lifecycle Automation Eng

[Immediate Start] K8S Lifecycle Automation Eng

LTIMindtreeIndia
11 hours ago
Job description

Job Description :

Senior Kubernetes Platform Engineer (Zero-Touch GPU Cloud – GitOps Automation)

We are looking for a Senior Kubernetes Platform Engineer with 10+ years of infrastructure experience to design and implement the Zero-Touch Build, Upgrade, and Certification pipeline for our on-premises GPU cloud platform. This role focuses on automating the Kubernetes layer and its dependencies (e.g., GPU drivers, networking, runtime) using 100% GitOps workflows. You will work across teams to deliver a fully declarative, scalable, and reproducible infrastructure stack—from hardware to Kubernetes and platform services.

Key Responsibilities

  • Architect and implement GitOps-driven Kubernetes cluster lifecycle automation using tools like kubeadm, ClusterAPI, Helm, and Argo CD.
  • Develop and manage declarative infrastructure components for :
  • GPU stack deployment (e.g., NVIDIA GPU Operator)
  • Container runtime configuration (Containerd)
  • Networking layers (CNI plugins like Calico, Cilium, etc.)
  • Lead automation efforts to enable zero-touch upgrades and certification pipelines for Kubernetes clusters and associated workloads.
  • Maintain Git-backed sources of truth for all platform configurations and integrations.
  • Standardize deployment practices across multi-cluster GPU environments, ensuring scalability, repeatability, and compliance.
  • Drive observability, testing, and validation as part of the continuous delivery process (e.g., cluster conformance, GPU health checks).
  • Collaborate with infrastructure, security, and SRE teams to ensure seamless handoffs between lower layers (hardware / OS) and the Kubernetes platform.
  • Mentor junior engineers and contribute to the platform automation roadmap.

Required Skills & Experience

  • 10+ years of hands-on experience in infrastructure engineering, with a strong focus on Kubernetes-based environments.
  • Primary key skills required are Kubernetes API, Helm templating, Argo CD GitOps integration, Go / Python scripting, Containerd
  • Deep knowledge and hands-on experience with :
  • Kubernetes cluster management (kubeadm, ClusterAPI)
  • Argo CD for GitOps-based delivery
  • Helm for application and cluster add-on packaging
  • Containerd as a container runtime and its integration in GPU workloads
  • Experience deploying and operating the NVIDIA GPU Operator or equivalent in production environments.
  • Solid understanding of CNI plugin ecosystems, network policies, and multi-tenant networking in Kubernetes.
  • Strong GitOps mindset with experience managing infrastructure as code through Git-based workflows.
  • Experience building Kubernetes clusters in on-prem environments (vs. managed cloud services).
  • Proven ability to scale and manage multi-cluster, GPU-accelerated workloads with high availability and security.
  • Solid scripting and automation skills (Bash, Python, or Go).
  • Familiarity with Linux internals, systemd, and OS-level tuning for container workloads.
  • Bonus :
  • Experience with custom controllers, operators, or Kubernetes API extensions
  • Contributions to Kubernetes or CNCF projects
  • Exposure to service meshes, ingress controllers, or workload identity providers
  • Create a job alert for this search

    Immediate Automation • India