Description : Position Overview :
We are seeking an experienced System Administrator with deep expertise in Kubernetes and modern infrastructure management. The ideal candidate will be responsible for ensuring the stability, scalability, and security of our infrastructure across on-premises and cloud environments. This role requires strong skills in Linux systems, container orchestration, and automation, along with the ability to troubleshoot complex issues in distributed systems.
Key Responsibilities :
Kubernetes Operations :
- Deploy, configure, upgrade, and maintain Kubernetes clusters (on-prem and / or cloud).
- Manage and optimize workloads, namespaces, RBAC, storage classes, ingress controllers, and monitoring within Kubernetes.
- Implement and maintain GitOps or CI / CD workflows for Kubernetes resource deployment.
Systems Administration :
Administer Linux servers, including package management, patching, and performance tuning.Manage storage (e.g., Ceph, NFS, cloud block / file stores) and networking for production systems.Troubleshoot and resolve system, networking, and application issues at both the OS and container orchestration layers.Automation & Tooling :
Build and maintain automation scripts and tooling (e.g., Ansible, Terraform, Helm, ArgoCD).Monitor systems with Prometheus, Grafana, Zabbix, or similar tools; set up alerts and dashboards.Automate backup, disaster recovery, and capacity planning tasks.Security & Compliance :
Implement Kubernetes best practices for cluster security, secrets management, and network policies.Ensure compliance with company policies, audits, and industry security :Work closely with DevOps, developers, and network / security teams to support reliable application delivery.Participate in on-call rotations and incident response as :3+ years of hands-on Linux systems administration experience.2+ years managing Kubernetes clusters in production (K8s internals, kube-proxy, CNI plugins, etc.).Experience with Proxmox virtualization platforms in a production environment.Hands-on experience managing Ceph distributed storage systems (including CephFS, RadosGW, or block storage).Proficiency with container runtimes (Docker, containerd) and Helm charts.Experience with infrastructure automation (Terraform, Ansible, or similar).Strong troubleshooting and debugging skills across systems, networks, and applications.Preferred :
Experience with cloud providers (AWS, GCP, Azure) and hybrid deployments.Knowledge of service mesh (Istio / Linkerd), MetalLB, or Kubernetes GPU workloads.Experience in monitoring / alerting with Prometheus, Grafana, or Zabbix.Scripting / programming knowledge (Python, Go, or Bash).Soft Skills :
Strong analytical and problem-solving abilities.Excellent written and verbal communication.Ability to prioritize tasks and manage time effectively in a fast-moving environment.Collaborative mindset with a focus on reliability and scalability.(ref : hirist.tech)