Key Responsibilities
- Design, develop, and maintain end-to-end MLOps pipelines to automate the machine learning lifecycle from training to deployment and monitoring.
- Collaborate with data scientists, ML engineers, and platform teams to operationalize ML models across cloud and hybrid environments.
- Build and manage containerized environments for training and inference using Docker and Kubernetes.
- Implement CI / CD workflows (e.g., GitHub Actions, Jenkins) for deploying ML models.
- Ensure observability and monitoring of models in production (latency, drift, performance, errors).
- Support model deployment to a variety of targets including APIs, applications, dashboards, and edge devices.
- Implement model versioning , rollback strategies, governance, and traceability using tools like MLflow or Kubeflow.
- Drive best practices across teams and provide technical mentorship on MLOps topics.
- Continuously evaluate and integrate new tools and technologies to improve MLOps capabilities.
Required Skills & Qualifications
8+ years of experience in software, data, or ML engineering, with 6 + years in MLOps .Strong programming experience in Python , SQL , and Spark / PySpark .Deep expertise in MLOps tools such as MLflow, Kubeflow, Airflow, etc.Experience with cloud platforms , preferably GCP (Vertex AI, GKE, Cloud Run) or AWS .Hands-on with Databricks , FastAPI , Docker , Kubernetes .Proficient with CI / CD , Git , and Infrastructure as Code (Terraform, Ansible).Knowledge of monitoring frameworks like Prometheus and Grafana.Experience in GCP , AWS is also acceptable.Strong communication and stakeholder management skills.Preferred Qualifications
Experience with building scalable, self-service ML infrastructure.Familiarity with model governance, compliance , and security in production environments.Prior work in building reusable and modular MLOps solutions for cross-functional teams.Skills Required
Gcp, Aws, Airflow, Jenkins