Talent.com
This job offer is not available in your country.
SRE & DevOps Engineer (ML / AI Platform)

SRE & DevOps Engineer (ML / AI Platform)

Prospance IncKakinada, IN
2 days ago
Job description

SRE & DevOps Engineer (ML / AI Platform)

Contract Position | Global E-Commerce Leader | Hybrid

About the Opportunity

We're partnering with a leading global e-commerce company to find an exceptional SRE & DevOps Engineer to join their AI Platform Team. This is your chance to shape the future of machine learning infrastructure that powers innovation for millions of users worldwide.

As part of this transformative role, you'll support cutting-edge AI platforms and services, working alongside researchers, data scientists, and engineering teams in a purpose-driven, inclusive environment.

What You'll Do

Platform Operations & Support

  • Support next-generation AI architecture for research and engineering teams
  • Partner with vendors and infrastructure teams to ensure security and 99.999% service availability
  • Diagnose and resolve production issues, including performance and functional challenges
  • Provide technical support to customers and document solutions

DevOps & Automation

  • Design and implement zero-downtime monitoring for highly available services
  • Build CI / CD pipelines for automated deployment and configuration
  • Identify automation opportunities to streamline problem management
  • Develop operational standards for tools, versioning, source control, and deployment practices
  • Continuous Improvement

  • Drive customer service enhancements and recommend product improvements
  • Define engineering excellence and operational maturity standards
  • Conduct customer training and generate insights reports
  • Accelerate team efficiency through automation and knowledge sharing
  • What You Bring

    Required Expertise

  • Should be having 5+ years of experience.
  • Strong Python development skills with data structure, algorithm, experience in designing, building, and releasing production software
  • Hands-on experience with ML frameworks : PyTorch, TensorFlow, Triton
  • Cloud-native technologies : Kubernetes, Docker, Linux
  • DevOps proficiency : CI / CD pipelines, Jenkins, test automation
  • Framework troubleshooting : version upgrades, compatibility management
  • Excellent debugging and triaging capabilities
  • Preferred Skills

  • Experience with AI / ML model training and inference platforms
  • LLM fine-tuning systems knowledge
  • Performance monitoring and application deployment automation
  • #SRE #DevOps #MLOps #AI #MachineLearning #Kubernetes #Python #PyTorch #TensorFlow #CloudEngineering #Hiring #TechJobs #ContractRole

    Create a job alert for this search

    Platform Engineer • Kakinada, IN