We are looking for a highly skilled MLOps & LLM Ops Engineer with strong expertise in deploying, automating, and monitoring AI / ML models - including Large Language Models (LLMs) - in production environments. The ideal candidate will have hands-on experience in CI / CD automation, container orchestration, data pipelines, LangChain, and cloud deployment across Azure / AWS. You will collaborate with data scientists, ML engineers, and customer architects to ensure seamless end-to-end delivery of scalable, high-performing AI systems.
Key Responsibilities :
1. Model Deployment & Automation :
- Automate the full lifecycle of AI / ML model deployment, including packaging, orchestration, scaling, and rollout strategies.
- Implement automated workflows for data, model versioning, and experiment tracking using tools like MLflow or similar systems.
- Deploy Large Language Models (LLMs) to production using frameworks such as LangChain, Flask, FastAPI, or custom Containerize and orchestrate model services using Docker & Kubernetes, enabling highly available and fault-tolerant inference pipelines.
2. CI / CD & Infrastructure Automation :
Build and maintain robust CI / CD pipelines using Git, Jenkins, GitHub Actions, or GitLab CI for continuous integration, testing, and deployment of ML solutions.Implement infrastructure-as-code (IaC) for automated provisioning of cloud resources (Terraform or equivalent).Automate deployment workflows for API endpoints, microservices, feature stores, and data processing pipelines.3. Data Pipelines & Real-Time Processing :
Design, deploy, and manage data ingestion and processing pipelines using Airflow, Kafka, and RabbitMQ.Ensure reliable, scalable, and secure data pipelines that support both training and inference workflows.Optimize data freshness, batch scheduling, and streaming performance for high-throughput model operations.4. LLM & Foundation Model Operations :
Integrate and operationalize foundation model APIs such as OpenAI, Anthropic, Gemini, Cohere, etc.Deploy custom or fine-tuned LLMs (GPT, Llama, Mistral, etc.) using LangChain or custom inference frameworks.Implement prompt management, evaluation, caching, vector store integrations, and retrieval-augmented generation (RAG)pipelines.
Ensure high performance, low latency, and reliability of LLM-based production systems.5. Cloud Deployment & Infrastructure Management :
Deploy ML workloads in Azure or AWS using services like Kubernetes (AKS / EKS), Lambda, EC2, S3 / ADLS, API Gateway, AzureFunctions, etc.
Monitor and optimize infrastructure cost, performance, and scalability for ML and LLM systems.Collaborate with customer architects to define, plan, and execute end-to-end deployments and solution architectures.6. Monitoring, Observability & Performance Optimization :
Implement and maintain observability stacks for model performance monitoring, including :Latency, throughput, drift detectionModel accuracy and quality metricsResource utilization, autoscaling behaviorUse tools like Prometheus, Grafana, ELK, Datadog, or cloud-native monitoring solutions.Troubleshoot production issues and perform root cause analysis across models, pipelines, and Skills & Qualifications :Strong hands-on experience in MLOps, production ML workflows, and automation.Expertise in CI / CD tools (Git, Jenkins, GitHub Actions, GitLab CI).Strong experience with Docker and Kubernetes for model containerization and deployment.Practical knowledge of MLflow, LangChain, and experiment tracking / versioning systems.Experience with Airflow, Kafka, RabbitMQ for large-scale data workflow orchestration.Experience working with foundation model APIs (OpenAI, Anthropic, etc.).Hands-on deployment experience on Azure and / or AWS cloud platforms.Familiarity with performance monitoring tools (Prometheus, Grafana, Datadog, CloudWatch, etc.).Solid understanding of distributed systems, microservices, and cloud-native architectures.Strong communication, analytical, and debugging skills.Ability to work in fast-paced environments and manage complex deployments.Preferred (Nice-to-Have) :
Knowledge of vector databases (Pinecone, Weaviate, FAISS, Chroma).Experience with RAG pipelines, semantic search, embeddings, or LLM orchestration frameworks.Exposure to model optimization techniques such as quantization, distillation, or low-latency inference optimization.Hands-on experience with Terraform, Helm, or ArgoCD.Experience with GPU-based deployments and optimization in cloud platforms.(ref : hirist.tech)