Key Responsibilities :
- Design, build, and maintain CI / CD pipelines for ML model training, validation, and deployment
- Automate and optimize ML workflows, including data ingestion, feature engineering, model training, and monitoring
- Deploy, monitor, and manage LLMs and other ML models in production (on-premises and / or cloud)
- Implement model versioning, reproducibility, and governance best practices
- Collaborate with data scientists, ML engineers, and software engineers to streamline end-to-end ML lifecycle
- Ensure security, compliance, and scalability of ML / LLM infrastructure
- Troubleshoot and resolve issues related to ML model deployment and serving
- Evaluate and integrate new MLOps / LLMOps tools and technologies
- Mentor junior engineers and contribute to best practices documentation
Required Skills & Qualifications :
8+ years of experience in DevOps, with at least 3 years in MLOps / LLMOpsStrong experience with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker)Proficient in CI / CD tools (Jenkins, GitHub Actions, GitLab CI, etc.)Hands-on experience deploying and managing different types of AI models (e.g., OpenAI, HuggingFace, custom models) to be used for developing solutions.Experience with model serving tools such as TGI, vLLM, BentoML, etc.Solid scripting and programming skills (Python, Bash, etc.)Familiarity with monitoring / logging tools (Prometheus, Grafana, ELK stack)Strong understanding of security and compliance in ML environmentsPreferred Skills :
Knowledge of model explainability, drift detection, and model monitoringFamiliarity with data engineering tools (Spark, Kafka, etc.Knowledge of data privacy, security, and compliance in AI systems.Strong communication skills to effectively collaborate with various stakeholdersCritical thinking and problem-solving skills are essentialProven ability to lead and manage projects with cross-functional teamsSkills Required
Prometheus, Elk Stack, Bash, Grafana, Jenkins, Devops, Gcp, MLops, Docker, Azure, Kubernetes, Python, Aws