About the Role :
We are seeking a highly skilled ML / DevOps Engineer with expertise in deploying and monitoring machine learning models at scale. This role focuses on ML model load testing, performance monitoring, and end-to-end automation using modern MLOps tools and platforms.
You will collaborate with ML Engineers, Backend Developers, and QA Engineers to ensure scalable, reliable, and robust deployment pipelines for ML models, especially those used in recommendation systems and real-time customer-facing Responsibilities :
- Design and implement end-to-end testing and load testing strategies for ML model deployments.
- Evaluate model latency, scalability, and reliability across varying RPS (requests per second).
- Build automated test cases for model performance validation before rollout.
- Set up real-time monitoring dashboards (e.g., with Grafana) to track error rates, response times, and other KPIs.
- Handle incident management related to ML model performance degradation or service failures.
- Collaborate with ML and backend teams to ensure smooth deployment pipelines using CI / CD tools like Jenkins.
- Develop and maintain model training and inference pipelines using platforms like Databricks, Kubeflow, and Tecton.
- Implement and maintain model versioning and lifecycle management using tools like mlFlow and Seldon.
- Use AWS cloud services (S3, EC2, Lambda, EKS, SageMaker, etc.) for hosting and scaling ML workloads.
- Support and improve ML platform components, including the feature store, model monitoring, and data Skills :
- Databricks
- mlFlow
- Seldon
- AWS (cloud services for ML)
- Kubeflow
- Tecton
- to Evaluate :
- Databricks
- mlFlow
- Seldon
- AWS
- Kubeflow
- Tecton
- Jenkins
- Grafana (for monitoring)
- Python (for scripting, automation, and Qualifications :
- Masters or Ph.D. in Computer Science, Data Science, Machine Learning, or a related field.
- 3+ years of hands-on industry experience (excluding academic or research-only roles).
- Strong programming proficiency in Python, Java, or Scala.
- Proven experience working on online ML systems, especially recommendation engines.
- Understanding of ML model monitoring, custom ML platforms, and feature stores.
- Familiarity with MLOps best practices and software engineering principles in production environments.
(ref : hirist.tech)