We are looking for a highly skilled DevOps Engineer with strong experience in DevSecOps and MLOps / LLMOps to design, automate, and secure our development and deployment pipelines.You will play a critical role in building scalable, secure, and production-ready infrastructure to support both traditional applications and machine learning / LLM workloads.This role demands a strong understanding of Kubernetes, CI / CD pipelines, infrastructure-as-code, model lifecycle management, and cloud-native security practices.
DevOps & Infrastructure
- Design, implement, and manage scalable, fault-tolerant infrastructure on cloud or hybrid environments (AWS / GCP / Azure / Hetzner / Bare metal).
- Develop and maintain CI / CD pipelines using tools like GitHub Actions , GitLab CI , Jenkins , or ArgoCD .
- Manage containerized workloads using Kubernetes , Helm , and Docker .
- Implement infrastructure as code (IaC) with Terraform / OpenTofu / Terragrunt .
- Monitor system performance, availability, and cost efficiency using Prometheus, Grafana, ELK, or Loki .
DevSecOps
Integrate security automation into CI / CD pipelines (SAST, DAST, SCA, dependency scanning).Implement policy as code using OPA / Conftest and enforce RBAC / IAM best practices.Manage secrets and credentials using tools like Vault , Sealed Secrets , or External Secrets Operator .Set up vulnerability scanning and runtime protection (e.g., Trivy, Falco, Aqua Security).Define security baselines for infrastructure, network, and containers.MLOps / LLMOps
Collaborate with ML and data teams to operationalize model training, evaluation, and deployment .Build automated pipelines for data preprocessing, model training, and inference deployment using tools like Kubeflow, MLflow, or Airflow .Manage feature stores, model registries, and monitoring for drift, latency, and accuracy.Support LLM pipelines — prompt orchestration, fine-tuning, vector DB integrations, and retrieval-augmented generation (RAG) .Optimize GPU-based workloads and manage distributed training / inference infrastructure.Required Skills & Qualifications
Languages : Python, Bash, Go (preferred)IaC Tools : Terraform / OpenTofu / TerragruntCI / CD : GitHub Actions, GitLab CI, Jenkins, ArgoCDContainers : Docker, Kubernetes, HelmMonitoring : Prometheus, Grafana, Loki, ELKSecurity : Trivy, Falco, Vault, OPA, SnykMLOps Tools : MLflow, Kubeflow, Airflow, Weights & BiasesCloud Platforms : AWS / GCP / Azure / HetznerDatabases : PostgreSQL, Redis, Vector DBs (Milvus, Pinecone, Weaviate, Qdrant)Nice to Have
Experience with GPU orchestration on Kubernetes (NVIDIA operator, KServe).Exposure to LLM frameworks (LangChain, LlamaIndex, vLLM, Ollama).Knowledge of data governance and compliance (GDPR, SOC2).Experience with self-hosted runners , GitOps , or multi-cluster management .Familiarity with event-driven systems (Kafka, NATS, or Redis Streams).What We Offer
Opportunity to work on challenging, large-scale systems with real-world impact.Collaborative team culture with focus on learning and innovation .Competitive compensation and growth opportunities.