Job Title : MLOps Lead
Location : Mumbai (Mahape)
Work Mode : 4 Days WFO & 1 Day WFH
Notice Period : Immediate – 30 Days
About the Role
We are looking for a Lead MLOps Engineer to spearhead the design, deployment, monitoring, and optimization of machine learning and AI-driven applications across cloud platforms. This role requires a strong technical leader who can collaborate with cross-functional teams, ensure scalability, and drive production-ready ML solutions.
Key Responsibilities
- Lead end-to-end ML pipeline development, deployment, and monitoring across GCP & Azure .
- Independently design and build MLOps solutions from scratch .
- Develop and maintain CI / CD pipelines (ArgoCD, GitHub Actions, GitLab CI, Docker).
- Automate & optimize ML model training, validation, deployment, and scaling using Kubernetes / Kubeflow .
- Design data processing workflows using Python & PySpark on distributed systems.
- Implement observability & monitoring with tools like Grafana, NewRelic, Prometheus .
- Transition research models into production-grade solutions in collaboration with Data Scientists.
- Guide and mentor junior engineers, conduct code reviews , and enforce coding best practices.
- Manage Infrastructure as Code (IaC) for reproducibility and scalability.
- Work with AI, RAG-related development, GPUs, and AI platforms .
- Exposure to Vertex AI Pipeline or similar cloud ML pipelines is a plus.
Required Skills & Experience
9+ years of hands-on experience in MLOps.Strong coding expertise in Python & PySpark .Proven experience with GCP & Azure (compute, storage, networking).Proficiency in Docker, Kubernetes , and container orchestration.Experience with ML lifecycle management, versioning, and monitoring .Hands-on with MLFlow for model versioning.Ability to build REST APIs (FastAPI / Flask / Django).Strong problem-solving, communication, and stakeholder management skills.Demonstrated experience in mentoring and leading teams.Show more
Show less
Skills Required
Newrelic, Pyspark, Prometheus, Grafana, Django, Gcp, Docker, Flask, FastAPI, Rest Apis, Azure, Kubernetes, Python