Talent.com
(ML OPNS Lead)Machine Learning Operations Lead Engineer

(ML OPNS Lead)Machine Learning Operations Lead Engineer

Epergne SolutionsChennai, Tamil Nadu, India
19 days ago
Job type
  • Quick Apply
Job description

Role : ML OPS Lead Engineer

Job Mode : Remote

Experience : 7+ Years

Notice Period : Immediate / 10 to 15 Days

Experience Required :

7+ years in platform or infrastructure engineering with significant experience in ML Ops, AI, and Cloud (Azure & AWS).

Key Responsibilities :

  • Design, deploy, and manage scalable, secure, and high-performing cloud-based infrastructures across Azure and AWS.
  • Lead end-to-end ML Ops lifecycle , including model deployment, monitoring, retraining, and CI / CD integration.
  • Collaborate with AI / ML, Data Science, and DevOps teams to automate model lifecycle management and streamline ML workflows.
  • Architect and implement governance, compliance, observability, and security frameworks for ML and GenAI systems.
  • Drive innovation in Generative AI and Agentic AI ecosystems , integrating services like Azure OpenAI, Bedrock, Anthropic Claude, and OpenAI API.
  • Implement infrastructure-as-code (IaC) practices using Terraform, Bicep, ARM, or CloudFormation .
  • Manage networking, IAM, and security configurations across Azure and AWS environments.
  • Establish monitoring, alerting, and performance dashboards using Grafana, Prometheus, Azure Monitor, and Log Analytics .

Required Technical Skills :

Cloud Platforms :

  • Azure : Azure AI Services, Azure Search, Azure ML, Databricks, AKS, Azure AI Foundry, Azure AI Hub.
  • AWS : SageMaker, Bedrock, Lambda, ECS, CDK, CloudFormation.
  • AI / ML & Generative AI :

  • Exposure to Generative and Agentic AI ecosystems (Azure OpenAI, Bedrock, Claude, LlamaCloud, LangChain).
  • Understanding of token usage, prompt injection, jailbreak risks , and mitigation methods.
  • Experience with Azure AI Evaluation SDK and AI Red Teaming Prompt Security Scans .
  • Hands-on experience with Python ML libraries (TensorFlow, PyTorch, Scikit-learn).
  • DevOps & Automation :

  • Strong experience with Azure DevOps / AWS CodePipeline for CI / CD setup and management.
  • Familiarity with Docker , Kubernetes , and container orchestration.
  • Knowledge of IaC tools (Terraform, ARM / Bicep, CloudFormation).
  • Database & Storage :

  • Azure Blob Storage, Cosmos DB, SQL, Key Vault, Data Lake Storage.
  • AWS S3, DynamoDB, RDS, Redshift, Aurora.
  • Understanding of OLTP and OLAP systems .
  • Networking & Security :

  • Proficiency in DNS, VPNs, Load Balancing, VNets, IAM , and access control (RBAC, SCP, Azure Policy).
  • Familiarity with Microsoft AD and principles of least privilege.
  • Hands-on with KMS , Key Vault , and identity governance best practices.
  • ML Engineering & Workflow Management :

  • Experience using Azure Machine Learning Studio, SDK (v2), CLI (v2) for model monitoring, retraining, and deployment.
  • Build and optimize end-to-end ML workflows for production environments.
  • Implement drift monitoring , model retraining , and technical & business validation processes.
  • Collaborate with data scientists for model deployment and performance optimization.
  • Additional Skills (Good to Have) :

  • Experience with code assistant tools (GitHub Copilot, Cursor, Claude Code).
  • Familiarity with Azure Bot Framework, APIM, Application Gateway .
  • Exposure to M365 Copilot and related ecosystem tools.
  • Proficiency with AWS Python SDK (Boto3) and AWS CDK .
  • Testing & Quality :

  • Implement unit and integration testing in CI / CD workflows (preferably using ADO).
  • Ensure testing and validation coverage for ML pipelines and infrastructure deployments.
  • Preferred Qualifications :

  • Bachelor s or Master s in Computer Science, Information Technology, or related field.
  • Certification(s) in Azure AI Engineer, AWS Machine Learning Specialty , or DevOps highly desirable.
  • Create a job alert for this search

    Lead • Chennai, Tamil Nadu, India