SRE & DevOps (ML Framework) - AI Platform
Location : Bangalore
Mode : Hybrid
Required Skills :
- Demonstrated ability in designing, building, refactoring and releasing software written in Python.
- Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton.
- Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments.
- Experience with AI / ML model training and inferencing platforms is a big plus.
- Experience with the LLM fi ne tuning system is a big plus.
- Debugging and triaging skills.
- Cloud technologies like Kubernetes, Docker and Linux fundamentals.
- Familiar with DevOps practices and continuous testing.
- DevOps pipeline and automations : app deployment / confi guration & performance monitoring.
- Test automations, Jenkins CI / CD.
- Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
- Well organized and able to manage multiple projects in a fast paced and demanding environment.
- Good oral / reading / writing English ability.