We are seeking an experienced AI / ML Engineer with 59 years of hands-on experience in building, deploying, and scaling AI / ML models on the Azure cloud ecosystem. The ideal candidate will have strong expertise in Azure Machine Learning, Databricks, Apache Spark, and deep experience in LLMs, RAG architectures, vector databases, and AI agent systems.
You will play a key role in architecting end-to-end AI solutions, integrating ML workflows with data engineering pipelines, and ensuring robust MLOps practices across enterprise environments.
Key Responsibilities :
1. ML Architecture, Model Development & Deployment :
- Architect, develop, and deploy scalable ML solutions using Azure ML Studio, Azure Machine Learning Services, and managed compute clusters.
- Create reusable ML pipelines for training, evaluation, and inference with emphasis on scalability, versioning, and reproducibility.
- Optimize models for performance, latency, and cost using Azure infrastructure and ML engineering best practices.
2. RAG Pipelines & LLM System Development :
Design and implement Retrieval-Augmented Generation (RAG) pipelines leveraging vector databases, embedding models, and cognitive search capabilities.Build and maintain multi-agent LLM systems using frameworks like LangChain, Semantic Kernel, or custom orchestration layers.Integrate foundation models (GPT, Llama, Mistral, etc.) and ensure robust prompt engineering, caching, evaluation, and inference workflows.3. Data Engineering & Azure Integration :
Build and operationalize ETL / ELT pipelines on Databricks, Azure Data Lake, Apache Spark, and Delta Lake.Work closely with data engineering teams to integrate ML models into large-scale data pipelines.Ensure seamless interaction between ML workloads and Azure-based services such as Azure Data Factory, Event Hub, and Azure Synapse.4. MLOps, CI / CD & Governance :
Implement end-to-end MLOps frameworks using Azure DevOps, including automated model training, validation, deployment, and monitoring.Set up CI / CD pipelines for ML components, datasets, and infrastructure deployments.Enforce ML governance - model lineage, experiment tracking, auditability, responsible AI practices, and compliance with cloud security standards.5. AI Integration & Multi-Agent Communication :
Leverage MCP (Multi-Agent Communication Protocol) for orchestrating complex agent systems and enabling autonomous workflows.Design architecture patterns for agent collaboration, task delegation, contextual memory, and controlled execution.Integrate agents with enterprise systems, APIs, vector databases, and orchestration engines.6. Monitoring, Optimization & Cloud Reliability :
Monitor model drift, performance degradation, and system health across ML pipelines and production environments.Implement observability solutions using Azure Monitor, Log Analytics, ML telemetry, or third-party tools.Optimize compute, storage, and workflows for performance efficiency and cloud cost governance.Required Skills & Qualifications :
5 - 9 years of experience in AI / ML engineering, model deployment, and cloud-based ML systems.Strong hands-on experience with Azure ML Studio, Azure Machine Learning Services, and Azure cloud architecture.Proficiency in Databricks, Apache Spark, PySpark, Python, SQL, and distributed data processing.Experience with LLMs, RAG pipelines, vector databases, embedding models, and AI agent architectures.Strong knowledge of Azure DevOps, CI / CD workflows, model packaging, and deployment automation.Hands-on experience with LangChain, Semantic Kernel, Pinecone, Chroma, Weaviate, or similar vector stores.Understanding of MCP (Multi-Agent Communication Protocol) concepts and agent-to-agent interactions.Good understanding of cloud security, compliance, and governance in enterprise environments.Strong analytical skills with excellent problem-solving, debugging, and communication abilities.Preferred (Nice-to-Have) :
Experience with Azure OpenAI, generative AI services, or custom LLM fine-tuning.Familiarity with MLflow for experiment tracking and model lifecycle management.Knowledge of containerized model deployment using Docker & Kubernetes.Experience with real-time inferencing, model optimization (quantization, distillation), or GPU-based training.Exposure to architectural patterns for scalable, production-grade AI systems.(ref : hirist.tech)