Job Description - MLOps Engineer (46 Years)
We are seeking an experienced MLOps Engineer with strong expertise in LLM deployment, optimization, and scalable model serving. The ideal candidate will work at the intersection of AI / ML engineering, DevOps, and cloud infrastructure, ensuring seamless integration of large-scale AI models into production.
Responsibilities :
- Design, deploy, and manage Large Language Models (LLMs) in production environments.
- Build and optimize scalable ML pipelines for training, fine-tuning, and inference.
- Implement MLOps best practices including CI / CD for ML, experiment tracking, and automated retraining workflows.
- Optimize model performance through quantization, pruning, distillation, and GPU / TPU acceleration.
- Manage and monitor LLM serving infrastructure with Kubernetes, Docker, and orchestration tools.
- Collaborate with data scientists and researchers to integrate models into real-world applications.
- Ensure reliability, scalability, and security of deployed ML systems.
- Implement observability and monitoring for model performance, drift detection, and resource utilization.
Key Skills & Experience :
4-6 years of hands-on experience in MLOps / ML Engineering.Expertise in LLM deployment, fine-tuning, and inference optimization.Strong knowledge of Kubernetes, Docker, MLflow, Kubeflow, Airflow, or similar platforms.Experience with model compression, distributed training (Horovod, DeepSpeed, Ray), and serving frameworks (TensorRT, Triton Inference Server, TorchServe, Hugging Face Inference).Proficiency in Python, PyTorch / TensorFlow, and cloud platforms (AWS / GCP / Azure).Hands-on experience with CI / CD pipelines (GitHub Actions, Jenkins, GitLab CI).Familiarity with vector databases (Pinecone, Weaviate, FAISS, Milvus) for LLM applications.Understanding of observability tools (Prometheus, Grafana, ELK, Datadog).Preferred :
Experience with retrieval-augmented generation (RAG) pipelines.Knowledge of LangChain, LlamaIndex, or similar frameworks.Exposure to multi-modal LLMs and real-time inference systems.(ref : hirist.tech)