We are seeking an experienced Kubernetes Expert who will be responsible for designing, implementing, and managing large-scale Kubernetes clusters with a strong focus on performance, security, and reliability.
Key Responsibilities :
- Design, deploy, and manage highly available Kubernetes clusters across multi-cloud and on-prem environments.
- Implement security best practices, role-based access control (RBAC), and compliance policies.
- Ensure smooth scaling, monitoring, and troubleshooting of clusters to meet enterprise-grade requirements.
- Integrate GPU support within Kubernetes clusters to optimize performance for AI / ML workloads.
- Collaborate with data science and engineering teams to ensure seamless execution of GPU-intensive applications.
- Develop and implement metering and monitoring solutions to track cloud resource consumption.
- Optimize resource allocation and provide insights for cost optimization and efficiency.
- Provide expertise on integrating Kubernetes with OpenStack environments.
- Manage and optimize hybrid cloud deployments leveraging both Kubernetes and OpenStack.
- Work closely with DevOps, Cloud, and Infrastructure teams to implement best practices.
- Prepare detailed documentation, runbooks, and guidelines for cluster operations.
Required Expertise & Skills :
Proven experience in designing, deploying, and managing Kubernetes clusters at scale.Hands-on experience in enabling GPU support in Kubernetes for AI / ML workloads.Strong knowledge of containerization technologies (Docker, CRI-O, containerd, etc.).Experience with monitoring and metering solutions (Prometheus, Grafana, custom tooling, etc.) for cloud resource utilization.Understanding of networking concepts within Kubernetes (CNI plugins, ingress, service mesh, etc.).Good knowledge of OpenStack services and experience with Kubernetes-OpenStack integration (preferred).Strong problem-solving, debugging, and performance-tuning skills.Familiarity with CI / CD pipelines and automation tools (Helm, Ansible, Terraform, ArgoCD, etc.).(ref : hirist.tech)