Key Responsibilities
- Develop and maintain CI / CD pipelines for machine learning models.
- Automate model deployment, monitoring, and scaling processes.
- Implement and manage version control for code, data, and models.
- Ensure data quality, security, and compliance throughout the ML lifecycle.
- Collaborate with data scientists, engineers, and other stakeholders.
- Optimize infrastructure for ML workloads.
- Troubleshoot and resolve issues in production ML systems.
- Implement logging, monitoring, and alerting for ML pipelines.
- Manage and optimize cloud resources for ML / AI workloads.
- Facilitate knowledge sharing and best practices across teams.
- Mentor peer developers on MLOps / AIOps and DevOps.
- Automate workflows for multi-model deployment on servers and embedded systems.
- Develop and manage APIs for serving multiple deep learning models efficiently using Flask, FastAPI, or similar.
- Optimize and convert models for embedded / PC / Android / Server deployment (e.g., TFLite, ONNX, .NEF) with quantization and pruning.
Required Technical Competencies
Programming languages : Python or similar.ML frameworks : TensorFlow, PyTorch, scikit-learn.Cloud platforms : AWS (preferred), Azure, or GCP.DevOps tools : Docker, Kubernetes, Jenkins, GitLab CI, Azure DevOps.Infrastructure as Code : Terraform, Ansible, CloudFormation.Big data technologies : Spark, Hadoop, Kafka.Monitoring and logging tools : ELK stack, Prometheus, Grafana.Version control : Git.Database management : SQL and NoSQL databases.Data pipeline tools : Airflow, Kubeflow or similar.CI / CD methodologies and tools for ML workflows.Model serving : Flask, FastAPI, TorchServe, TensorFlow Serving.Understanding of ML algorithms and model performance metrics.Knowledge of data privacy and security best practices.Familiarity with MLOps principles and tools : MLflow, DVC, Weights & Biases.(ref : hirist.tech)
Skills Required
Airflow, Prometheus, Elk Stack, Kafka, Grafana, Tensorflow, Nosql, Pytorch, Terraform, Docker, Flask, Python, Azure Devops, Aws, Hadoop, Cloudformation, Sql, Jenkins, Git, Gcp, Ansible, Spark, FastAPI, Azure, Kubernetes