Job description???? We're Hiring! I'm excited to share that we're looking for SRE and Dev Ops - ML Framework to join our team at ITC Infotech.Below is the JD for your reference.Job Functions :You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization.You'll partner with vendors and the infrastructure engineering team for security and service availabilityYou'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issuesDiagnose and solve customer technical problemsParticipate in training customers and prepare reports on customer issuesBe responsible for customer service improvements and recommend product improvementsWrite support documentationYou'll design and implement zero-downtime to monitor and accomplish a highly available service (99.999%)As a support engineer, find opportunities to automate as part of the problem management process, creating automation to avoid issuesDefine engineering excellence for operational maturityYou'll work together with AI platform developers to provide the CI / CD model to deploy and configure the production system automaticallyDevelop and follow operational standard processes for tools and automation development. Including : Style guides, versioning practices, source control, branching and merging patterns and advising other engineers on development standardsDeliver solutions that accelerate the activities, phenomenal engineers would perform through automation, deep domain expertise, and knowledge sharingRequired Skills :Demonstrated ability in designing, building, refactoring and releasing software written in Python.Hands-on experience with ML frameworks such as Py Torch, Tensor Flow, TritonAbility to handle framework-related issues, version upgrades, and compatibility with data processing / model training environmentsExperience with AI / ML model training and inferencing platforms is a big plusExperience with the LLM fine tuning system is a big plusDebugging and triaging skillsCloud technologies like Kubernetes, Docker and Linux fundamentalsFamiliar with Dev Ops practices and continuous testingDev Ops pipeline and automations : app deployment / configuration & performance monitoringTest automations, Jenkins CI / CDExcellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teamsWell organized and able to manage multiple projects in a fast paced and demanding environmentGood oral / reading / writing English ability.Job Location : BangaloreIf you're interested or know someone who might be a great fit, please reach out or apply