This role is for one of our clients
Industry : Technology, Information and Media
Seniority level : Mid-Senior level
Min Experience : 6 years
Location : Remote (India)
JobType : full-time
We are seeking a highly skilled Lead AI Infrastructure Engineer to drive the development and management of our AI and ML infrastructure. This role blends technical leadership with hands-on execution, overseeing the end-to-end ML lifecycle — from model training and deployment to monitoring, optimization, and scaling. You will lead a small team of engineers while ensuring seamless collaboration between research, engineering, and operations teams.
Key Responsibilities
ML Infrastructure & Lifecycle Management
Design, maintain, and optimize scalable infrastructure for ML training, inference, and experimentation.
Ensure model deployment pipelines are reliable, efficient, and cost-effective.
Implement robust monitoring, alerting, and automated rollback mechanisms to maintain system reliability.
Collaboration with Research & Product Teams
Partner with research teams to streamline workflows for training, evaluation, and fine-tuning of models.
Support AI-driven initiatives across product teams by providing reliable infrastructure and operational expertise.
Team Leadership & Mentorship
Lead a small team of ML engineers, providing guidance, mentoring, and technical support.
Balance hands-on engineering work with strategic oversight of infrastructure projects.
Performance & Optimization
Enhance model inference latency, throughput, and cost-efficiency.
Apply model optimization techniques such as quantization, distillation, and TensorRT integration.
Automation & Best Practices
Develop and enforce CI / CD practices for ML models, including versioning, testing, and deployment.
Establish MLOps standards and operational excellence across teams.
Cloud & Platform Management
Leverage cloud-based ML platforms (AWS SageMaker, GCP Vertex AI, Azure ML) to optimize workflows and costs.
Maintain secure, compliant, and scalable AI environments for both training and inference workloads.
Architecture & Strategy
Contribute to ML architecture design, documentation, and roadmap planning.
Continuously evaluate emerging AI infrastructure technologies to improve efficiency and performance.
Qualifications & Skills
5+ years of hands-on experience in MLOps, ML Engineering, or AI Infrastructure roles.
Strong understanding of ML / DL concepts with applied experience in model training and deployment.
Proficiency with cloud-native ML platforms : AWS SageMaker, GCP Vertex AI, or Azure ML.
Experience with Kubernetes, Docker, MLflow, Kubeflow, or similar orchestration tools.
Familiarity with model optimization techniques : quantization, distillation, TensorRT, FasterTransformer.
Proven ability to lead technical projects and mentor engineers in a fast-paced environment.
Excellent communication and cross-functional collaboration skills.
Ownership-driven mindset and ability to bring clarity to ambiguous technical challenges.
Core Skills
MLOps | ML Infrastructure | Model Deployment | Model Monitoring | CI / CD for ML | Cloud ML Platforms | Kubernetes | Docker | Vertex AI | AWS SageMaker | Kubeflow | MLflow | Model Optimization
Skills Required
MLops, Model Monitoring, Docker, Kubernetes
Lead Ai Engineer • India