Role Summary
The Director of AI Platform & MLOps is a critical, hands-on technology leader responsible for the architecture, execution, and day-to-day management of the infrastructure powering our AI Training Data Services. You will lead a world-class team of engineers to build, scale, and operate a robust, secure, and highly automated platform. This role requires a deep technical background in cloud infrastructure, MLOps, and large-scale systems, combined with proven experience in leading high-performing engineering teams.
Responsibilities
Technical Execution & Architecture :
- Translate the long-term vision into an executable technical roadmap, making key architectural decisions for the platform.
- Lead the hands-on design and implementation of our MLOps / LLMOPs framework, including CI / CD for models, data / model versioning, automated workflows, and monitoring.
- Engineer and manage a scalable, multi-cloud infrastructure (AWS / GCP / Azure) using Infrastructure as Code (IaC) principles (Terraform, CloudFormation).
- Oversee the technical integration, scaling, and reliability of data annotation platforms and the GIG worker technology layer.
- Drive SRE best practices to ensure high availability, performance, and security of the entire platform.
Team & Operational Leadership
Recruit, lead, mentor, and directly manage a team of Cloud, DevOps, MLOps, and Site Reliability Engineers.Foster a culture of technical excellence, automation, and accountability within the engineering team.Manage the day-to-day project timelines, resource allocation, and operational activities to ensure successful platform delivery.Implement and optimize the cloud budget (FinOps) for your domain, ensuring cost-effective scaling.Stakeholder & Client Engagement
Act as the primary technical expert in project-level discussions with clients, providing detailed solutioning and estimation for new deals.Collaborate closely with product management and service delivery teams to ensure the platform meets their requirements.Skills & Qualification
Experience : 12+ years in platform engineering or DevOps, with at least 4 years in a leadership role directly managing engineering teams.Cloud Architecture Mastery : Deep, hands-on experience designing and managing complex infrastructure on AWS, GCP, or Azure. Expertise with Kubernetes (EKS, GKE, AKS), serverless, and core cloud services.Expertise in MLOps / LLMOPs : Demonstrable, in-depth implementation experience with the full MLOps lifecycle (e.g., Kubeflow, MLflow, Seldon, DVC, Airflow) and infrastructure for LLMs (Vector DBs, fine-tuning environments).Infrastructure as Code (IaC) : Strong, hands-on proficiency with Terraform or CloudFormation is a must.Technical Leadership : Proven ability to lead, mentor, and scale a technical team, driving projects from conception to production.Problem Solving : Exceptional ability to debug and solve complex technical issues in a distributed systems environment.Skills Required
Airflow, Cloudformation, Cloud Infrastructure, Gcp, MLops, Terraform, Azure, Kubernetes, Aws