Talent.com
Site Reliability Engineer

Site Reliability Engineer

ACL DigitalMumbai, IN
13 days ago
Job description

About the Company

ACL Digital is Hiring for the Below position

ACL Digital, part of the ALTEN Group, is a trusted AI-led, Digital & Systems Engineering Partner driving innovation by designing and building intelligent systems across the full technology stack — from chip to cloud. By integrating AI and data-powered solutions, we help enterprises accelerate digital transformation, optimize operations, and achieve scalable business outcomes. Partner with us to turn complexity into clarity and shape the future of your organization.

About the Role

Ability to handle framework-related issues, version upgrades, and compatibility with data processing / model training environments. Experience with AI / ML model training and inferencing platforms is a big plus. Experience with the LLM fine tuning system is a big plus.

Experience : 5+ Years

Responsibilities

  • Debugging and triaging skills.
  • Cloud technologies like Kubernetes, Docker and Linux fundamentals.
  • Familiar with DevOps practices and continuous testing.
  • DevOps pipeline and automations : app deployment / configuration & performance monitoring.
  • Test automations, Jenkins CI / CD.
  • Excellent communication, presentation, and leadership skills to be able to work and collaborate with partners, customers and engineering teams.
  • Well organized and able to manage multiple projects in a fast paced and demanding environment.
  • Good oral / reading / writing English ability.

Required Skills

PyTorch, TensorFlow, Triton , Kubernetes, Docker and Linux fundamentals , Test automations, Jenkins CI / CD SRE / DevOps (ML Framework)

```

Create a job alert for this search

Site Reliability Engineer • Mumbai, IN