Job Description :
We are a rapidly expanding AI startup that is transforming digital advertising with groundbreaking AI technology and innovative solutions. We are seeking a dynamic and highly skilled Lead MLOps Engineer to join our team and play a critical role in building the future of advertising.
As a Lead MLOps Engineer, you will be at the forefront of deploying and scaling cutting-edge machine learning models, including custom Transformer-based architectures, to predict consumer behavior. You will take ownership of managing high-performance inference systems, ensuring the seamless delivery of daily predictions for millions of consumers at an unprecedented scale.
The ideal candidate will be a seasoned expert in machine learning operations with deep experience managing large-scale data environments. You should have a proven track record of deploying end-to-end machine learning solutions at scale on cloud platforms (preferably in AWS), with a focus on performance and cost optimization and reliability.
Key Responsibilities :
- Lead MLOps Strategy : Architect and implement machine learning pipelines capable of handling millions of customer predictions
- Build Scalable Infrastructure : Design and build highly scalable and reliable cloud-based infrastructure on AWS to support the training, deployment, and monitoring of machine learning models at scale.
- Cost and Resource Management : Optimize the use of cloud resources, ensuring cost-effective scaling while maintaining high availability and performance standards.
- CI / CD Pipeline Implementation : Develop and optimize CI / CD pipelines specifically for ML models, ensuring smooth transitions from development to production.
- Automation and Optimization : Implement automation tools for model lifecycle management, model retraining, and data pipeline management
- Model Monitoring and Performance : Oversee the development and implementation of robust monitoring solutions to track model performance, identify issues, and ensure that models continue to meet business objectives.
- Collaboration : Work closely with cross-functional teams to ensure alignment between business needs, model performance, and infrastructure capabilities
- Documentation : Document processes, methodologies, and findings comprehensively and ensure that all documentation is kept up-to-date and accurate
- Innovation and Research : Stay up to date with new machine learning techniques, tools, and technologies, and apply this knowledge to improve existing solutions
Qualifications :
Bachelor's, Master's or PhD in Computer Science, Engineering, AI or a related fieldAt least 5+ years of experience in machine learning operationsExtensive experience deploying, managing, and scaling AI / ML workloads for large scale data on AWS services such as EC2, SageMaker, Lambda, and other AWS offeringsProficiency in Docker, Kubernetes, and container orchestration, with experience in deploying machine learning models in these environmentsProven track record in designing and implementing CI / CD pipelines for ML modelsExperience in performance tuning, cost optimization, and managing resources in cloud-based environments.Strong understanding of machine learning concepts, including supervised and unsupervised learning and deep learningStrong programming skills in Python, and experience with frameworks and libraries such as TensorFlow, PyTorch, scikit-learn and KerasExcellent problem-solving and analytical skillsStrong communication and collaboration skillsNice to have :
Prior experience working in a fast-paced startup environmentWhat we offer :
We offer a competitive salary, benefits, and a dynamic work environment. If you are passionate about AI and predicting consumer behaviour, have a strong entrepreneurial spirit, and want to be part of a rapidly growing startup, we encourage you to apply for this exciting opportunity!