Position : Lead MLOps Engineer
Experience : 9+ Years
Location : Mumbai, Mahape
Email ID for CV sharing : [HIDDEN TEXT]
About the Role
We are looking for a Lead MLOps Engineer who will lead and support the end-to-end deployment, monitoring, and optimization of machine learning and data-driven applications across cloud platforms. You will collaborate with data scientists, engineers, and business stakeholders to ensure scalable, secure, and highly available ML systems.
Key Responsibilities :
- Be a hands-on contributor capable of independently designing and developing complete MLOps solutions from scratch.
- Lead end-to-end ML pipeline development, deployment, and monitoring across GCP and Azure .
- Build and maintain CI / CD pipelines using tools like ArgoCD, Git, and Docker.
- Automate and optimize ML model training, validation, deployment, and scaling using Kubernetes, Kubeflow, or similar orchestration platforms .
- Develop data processing workflows using Python and PySpark on distributed systems.
- Implement observability using tools like Grafana, NewRelic , and cloud-native monitoring solutions.
- Collaborate with Data Scientists to transition research into production-grade solutions.
- Guide and mentor junior engineers, enforce coding standards, and conduct code reviews.
- Demonstrate business understanding to align ML pipelines with product goals.
- Manage infrastructure as code (IaC) for reproducibility and scalability.
- Exposure to AI and RAG-related development , various GPU and AI Platforms required.
Required Skills & Experience :
9+ years of hands-on experience in MLOps roles.Strong proficiency in Python and PySpark with clean and scalable code practices.Expertise in GCP and Azure cloud platforms – including compute, storage, and networking components.Proven experience in deploying and managing containerized applications using Docker and Kubernetes .Hands-on with CI / CD tools – preferably ArgoCD, GitHub Actions, or GitLab CI .Experience in monitoring, logging, and alerting using tools such as Grafana, NewRelic, Prometheus , or similar.Understanding of ML model lifecycle, versioning, and performance monitoring.Experience with MLFlow for model versioning.Ability to create REST APIs using FastAPI, Flask, or Django .Strong problem-solving, communication, and stakeholder management skills.Experience mentoring teams and driving end-to-end project execution.Exposure to Vertex AI Pipeline in GCP or similar in other clouds is a plus.Show more
Show less
Skills Required
Newrelic, Pyspark, Prometheus, Grafana, Django, Gcp, Docker, Flask, FastAPI, Azure, Kubernetes, Python