Role Overview
We are looking for an experienced
MLOps Lead
with deep expertise in
Azure and AWS cloud ecosystems , who can design, deploy, and manage scalable AI / ML infrastructure. The ideal candidate should bring a strong background in
cloud governance, GenAI tooling, automation, and CI / CD pipelines , with hands-on experience across modern MLOps frameworks.
Key Responsibilities
Design, implement, and manage scalable cloud-based AI / ML infrastructure across
Azure and AWS .
Drive
end-to-end MLOps lifecycle
— model deployment, monitoring, retraining, and governance.
Enable
GenAI and Agentic AI platforms
leveraging Azure OpenAI, Bedrock, Anthropic Claude, LangChain, etc.
Implement
CI / CD pipelines
using Azure DevOps or AWS CodePipeline.
Ensure
security, observability, and compliance
across ML and GenAI ecosystems.
Manage infrastructure automation via
Terraform, Bicep, CloudFormation , or similar IaC tools.
Collaborate with data science and engineering teams to optimize ML workflows, data pipelines, and API integrations.
Implement
monitoring and alerting
using Grafana, Prometheus, Azure Monitor, and Application Insights.
Oversee
networking, identity management, and role-based access controls (IAM, RBAC)
across clouds.
Support model lifecycle management —
drift monitoring, retraining, technical evaluation, and business validation.
Technical Skills & Expertise
Cloud & MLOps Platforms
Azure :
Azure ML, Azure AI Services, Azure OpenAI, Azure Kubernetes Service (AKS), Databricks, Azure Search, Azure Blob, Cosmos DB, Azure SQL, Azure Functions, Azure Event Hub, Azure Resource Manager (ARM), Bicep.
AWS :
SageMaker, Bedrock, Lambda, DynamoDB, S3, RDS, Redshift, ECR, CloudFormation, CDK, KMS, EventBridge, Step Functions.
AI / ML & Programming
Hands-on in
Python , with exposure to TensorFlow, PyTorch, scikit-learn.
Understanding of
LLM tokenization, prompt injection risks, jailbreak prevention, and AI safety techniques.
Familiarity with
LangChain, LlamaCloud, AI Foundry , and related frameworks.
Experience in
model monitoring, retraining, and evaluation workflows.
DevOps & Infrastructure
Expertise in
CI / CD pipelines ,
containerization (Docker, Kubernetes) , and
infrastructure automation .
Strong in
governance, audit logging, security policies
(Azure Policy, AWS SCP, IAM).
Deep understanding of
networking, DNS, load balancers, VNets / VPCs, VPNs.
Skilled in
IaC
tools – Terraform, Bicep, ARM, CloudFormation.
Monitoring & Observability
Experience with
Grafana, Prometheus, Application Insights, Log Analytics Workspaces, Azure Monitor.
Security & Access Management
Understanding of
Microsoft AD, least privilege principles, IAM, RBAC.
Testing & Automation
Familiarity with
unit testing and integration testing
in CI / CD workflows (preferably Azure DevOps).
Good to Have
Experience with
Azure Bot Framework ,
M365 Copilot , and
APIM .
Exposure to
code assistants
such as GitHub Copilot, Cursor, Claude Code.
Knowledge of
Boto3 SDK (AWS Python)
and
TypeScript for IaC .
Preferred Background
Strong background in
cloud infrastructure engineering
and
machine learning operations .
Proven ability to lead
cross-functional teams
and implement
AI governance
at scale.
Excellent problem-solving, communication, and documentation skills.
Mlops Engineer • Amritsar, Punjab, India