Project Description
We are seeking a skilled ML Platform Engineer, responsible for automating, deploying, patching, and maintaining our machine learning platform infrastructure. The ideal candidate will have hands-on experience with Cloudera Data Science Workbench (CDSW), Cloudera Data Platform (CDP), Docker, Kubernetes, Python, Ansible, GitLab, and MLOps best practices.
Responsibilities
Automate deployment and management processes for machine learning platforms using tools such as Ansible and Python.
Deploy, monitor, and patch ML platform components, including Cloudera Data Science Workbench (CDSW), Docker containers, and Kubernetes clusters.
Ensure high availability and reliability of ML infrastructure through proactive maintenance and regular updates.
Develop and maintain comprehensive documentation for platform configurations, processes, and procedures.
Troubleshoot and resolve platform issues, ensuring minimal downtime and optimal performance.
Implement best practices for security, scalability, and automation within the ML platform ecosystem.
Mandatory Skills
Experience with CDSW (Cloudera Data Science Workbench ) or Cloudera Data Platform (CDP)
Proficiency in containerization and orchestration using Docker and Kubernetes.
Solid scripting and automation skills in Python and Ansible.
Experience with GitLab for source control and CI / CD automation.
Understanding of MLOps principles and practices.
Familiarity with patching, updating, and maintaining platform infrastructure.
Profound Unix knowledge
Excellent problem-solving skills and a collaborative approach to team projects.
Strong experience with Python programming language in developing enterprise level applications
Proficient in designing, developing and maintaining distributed systems and services
Experience in Ansible automation tool for platform IaC, deployment automation and configuration management
Location- Kharadi (Pune)
Experience - 6-13 yrs
Work Mode- Hybrid (3 days in office)
Senior Platform Engineer • Pune, Maharashtra, India