We're Hiring! I'm excited to share that we're looking for SRE and DevOps - ML Framework to join our team at ITC Infotech.
Below is the JD for your reference.
Job Functions :
- You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization.
- You'll partner with vendors and the infrastructure engineering team for security and service availability
- You'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issues
- Diagnose and solve customer technical problems
- Participate in training customers and prepare reports on customer issues
- Be responsible for customer service improvements and recommend product improvements
- Write support documentation
- You'll design and implement zero-downtime to monitor and accomplish a highly available service (99.999%)
- As a support engineer, find opportunities to automate as part of the problem management process, creating automation to avoid issues
- Define engineering excellence for operational maturity
- You'll work together with AI platform developers to provide the CI / CD model to deploy and configure the production system automatically
- Develop and follow operational standard processes for tools and automation development. Including : Style guides, versioning practices, source control, branching and merging patterns and advising other engineers on development standards
- Deliver solutions that accelerate the activities, phenomenal engineers would perform through automation, deep domain expertise, and knowledge sharing
Required Skills :
Demonstrated ability in designing, building, refactoring and releasing software written in Python.Hands-on experience with ML frameworks such as PyTorch, TensorFlow, TritonAbility to handle framework-related issues, version upgrades, and compatibility with data processing / model training environmentsExperience with AI / ML model training and inferencing platforms is a big plusExperience with the LLM fine tuning system is a big plusDebugging and triaging skillsCloud technologies like Kubernetes, Docker and Linux fundamentalsFamiliar with DevOps practices and continuous testingDevOps pipeline and automations : app deployment / configuration & performance monitoringTest automations, Jenkins CI / CDExcellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teamsWell organized and able to manage multiple projects in a fast paced and demanding environmentGood oral / reading / writing English ability.Job Location : Bangalore
If you're interested or know someone who might be a great fit, please reach out or apply