Talent.com
This job offer is not available in your country.
MLOps Lead Engineer

MLOps Lead Engineer

RecroGandhinagar, IN
20 hours ago
Job description

5+ years of experience in platform engineering, with a proven track record of designing, deploying, and managing scalable and secure cloud-based infrastructures, leveraging both Azure and AWS services.

 Experience with Azure services such as Azure AI services, Azure Search, Azure ML, Databricks, Azure Kubernetes Service, and AWS services like AWS SageMaker, AWS Bedrock and AWS Lambda.

 Exposure to Generative AI and Agentic AI ecosystems such as Azure OpenAI, Azure AI Foundry, Azure AI Hub, Bedrock, Anthropic Claude, OpenAI API, LlamaCloud, LangChain.

 Understanding of token usage, LLM prompt injection risks, Jailbreak attempts and mitigation techniques.

 Strong knowledge of governance, audit, observability, and compliance in cloud- based GenAI and ML ecosystems.

 Should understand Azure AI Evaluation SDK and AI Red Teaming Prompt Security Scans

 Good to have experience with code assistant tools like Github Copilot, Cursor and Claude Code

 Expertise in Azure DevOps or AWS CodePipeline, including setting up and managing CI / CD pipelines.

 Advanced experience with Azure Blob Storage, Cosmos DB, SQL, Key Vault,, AWS S3, DynamoDB, and AWS RDS etc and their integrations with AI services

 Advanced understanding of networking concepts, including DNS management, load balancing, VPNs, and virtual networks (VNets).

 Advanced understanding of security concepts, including IAM roles, identities, Azure policies, AWS SCPs.

 Experience in Advanced Authentication and Authorization Concepts across various cloud providers and platforms

 Must have experience with Azure Policy, AWS SCP, AWS IAM, audit logging, Azure RBAC etc.

 Mastery of infrastructure-as-code tools such as Azure ARM / Bicep, Terraform, CloudFormation, or equivalent.

 Proficiency in networking, DNS, load balancers, and cloud engineering services.

 Knowledge in Python programming and AI / ML libraries (TensorFlow, PyTorch, Sci-Kit learn, Bash & PowerShell etc.).

 Experience with containerization and orchestration tools such as Docker and Kubernetes.

 Good to have knowledge about Azure Bot framework, APIM, Application Gateway. Also, knowledge about M365 offerings like M365 Copilot. AWS CDK, AWS Python(Boto3) SDK.

 Experience with monitoring tools like Grafana, Prometheus, Application Insights,

Log Analytics Workspaces, and Azure Monitor

 Understanding of common database technologies for both OLTP and OLAP applications

 Azure Services knowledge (Azure machine learning)

o Experience in ML tooling Knowledge of Azure Machine Learning studio, Python SDK (v2),CLI (v2)) to monitor, retrain, and redeploy models.

o Exposure to Azure Machine Learning model as Architect or model built from an open-source platform, such as Pytorch, TensorFlow, or scikit- learn.

o Practical knowledge of how to build efficient end-to-end ML workflows

o Understanding of machine learning & deep learning concepts and algorithms, various statistical techniques, and experimentation analysis workflows

o Enable production models across the ML lifecycle

o Implement CI / CD orchestration for data science pipelines

o Understanding the production deployments and post-deployment model lifecycle management activities : drift monitoring, model retraining, and model technical evaluation business validation

o Work with stakeholders to assist with ML pipeline -related technical issues and support modelling infrastructure needs.

 Security and Access

o Familiarity with Microsoft AD

o Understanding of principle of least privilege and its application to projects

(RBAC)

 Testing

o Understand and apply unit testing in day-to-day development work

o Applied knowledge of integration testing as part of CI / CD process (ideally on ADO)

 Infrastructure as Code

o Understand the key concepts of IaC and some practical experience of application

o Write code in languages such as Python or TypeScript to define cloud infrastructure as code

 Specific to AWS :

o AWS database services (such as RDS, DynamoDB, Redshift, Aurora)

o AWS compute services and storage (such as EC2 – including scaling, EBS, EFS)

o AWS serverless technology (including Lambda, SQS, SNS, EventBridge and Step functions)

o AWS KMS

o AWS container services (including ECR)

o AWS CloudFormation and the CDK

 Specific to AWS :

o Azure database services (including Cosmos DB, Azure SQL Serverless)

o Azure compute services (VM, VM Scale Sets)

o Azure serverless technology (i.e. Functions, Event Grid / Hub, Queue Storage, Service Bus)

o Azure container services (including ACR / AKS)

o Azure Resource Manager (ARM) / BICEP

o Azure Key Vault

o Azure Machine Learning

o Azure Data Lake Storage

Create a job alert for this search

Mlops Engineer • Gandhinagar, IN