Talent.com
This job offer is not available in your country.
SRE and DevOps ML Framework

SRE and DevOps ML Framework

ITC InfotechNagpur, IN
1 day ago
Job description

🔍 We're Hiring! I'm excited to share that we're looking for SRE and DevOps - ML Framework to join our team at ITC Infotech.

Below is the JD for your reference.

Job Functions :

  • You will be a member of our AI Platform Team, supporting the next generation AI architecture for various research and engineering teams within the organization.
  • You'll partner with vendors and the infrastructure engineering team for security and service availability
  • You'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issues
  • Diagnose and solve customer technical problems
  • Participate in training customers and prepare reports on customer issues
  • Be responsible for customer service improvements and recommend product improvements
  • Write support documentation
  • You'll design and implement zero-downtime to monitor and accomplish a highly available service (99.999%)
  • As a support engineer, find opportunities to automate as part of the problem management process, creating automation to avoid issues
  • Define engineering excellence for operational maturity
  • You'll work together with AI platform developers to provide the CI / CD model to deploy and configure the production system automatically
  • Develop and follow operational standard processes for tools and automation development. Including : Style guides, versioning practices, source control, branching and merging patterns and advising other engineers on development standards
  • Deliver solutions that accelerate the activities, phenomenal engineers would perform through automation, deep domain expertise, and knowledge sharing

Required Skills :

  • Demonstrated ability in designing, building, refactoring and releasing software written in Python.
  • Hands-on experience with ML frameworks such as PyTorch, TensorFlow, Triton
  • Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments
  • Experience with AI / ML model training and inferencing platforms is a big plus
  • Experience with the LLM fine tuning system is a big plus
  • Debugging and triaging skills
  • Cloud technologies like Kubernetes, Docker and Linux fundamentals
  • Familiar with DevOps practices and continuous testing
  • DevOps pipeline and automations : app deployment / configuration & performance monitoring
  • Test automations, Jenkins CI / CD
  • Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams
  • Well organized and able to manage multiple projects in a fast paced and demanding environment
  • Good oral / reading / writing English ability.
  • Job Location : Bangalore

    If you're interested or know someone who might be a great fit, please reach out or apply

    Create a job alert for this search

    Sre • Nagpur, IN