Talent.com
This job offer is not available in your country.
MLOps Site Reliability Engineer

MLOps Site Reliability Engineer

KLAchennai, India
17 hours ago
Job description

Description

/ Preferred Qualifications

We are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and software developers to build and maintain robust and efficient systems that support our machine learning workflows. This position offers an exciting opportunity to work on cutting-edge technologies and make a significant impact on our organization's success.

Responsibilities :

  • Design, implement, and maintain scalable and reliable machine learning infrastructure.
  • Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production.
  • Develop and maintain CI / CD pipelines for machine learning workflows.
  • Monitor and optimize the performance of machine learning systems and infrastructure.
  • Implement and manage automated testing and validation processes for machine learning models.
  • Ensure the security and compliance of machine learning systems and data.
  • Troubleshoot and resolve issues related to machine learning infrastructure and workflows.
  • Document processes, procedures, and best practices for machine learning operations.
  • Stay up-to-date with the latest developments in MLOps and related technologies.

Required Qualifications :

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Site Reliability Engineer (SRE) or in a similar role.
  • Strong knowledge of machine learning concepts and workflows.
  • Proficiency in programming languages such as Python, Java, or Go.
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud.
  • Familiarity with containerization technologies like Docker and Kubernetes.
  • Experience with CI / CD tools such as Jenkins, GitLab CI, or CircleCI.
  • Strong problem-solving skills and the ability to troubleshoot complex issues.
  • Excellent communication and collaboration skills.
  • Preferred Qualifications :

  • Master's degree in Computer Science, Engineering, or a related field.
  • Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn.
  • Knowledge of data engineering and data pipeline tools such as Apache Spark, Apache Kafka, or Airflow.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack.
  • Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.
  • Experience with automated testing frameworks for machine learning models.
  • Knowledge of security best practices for machine learning systems and data.
  • Minimum Qualifications

    Master's / Bachelor's Level Degree and related work experience of 2 years

    Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.

    Create a job alert for this search

    Site Reliability Engineer • chennai, India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    PoshmarkChennai, Tamil Nadu, India
    We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale...Show moreLast updated: 5 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ExasoftChennai, IN
    Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 1 day ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    XebiaChennai, IN
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 28 days ago
    • Promoted
    Senior MLOps Engineer

    Senior MLOps Engineer

    Mitchell Martin Inc.Chennai, IN
    Include, but are not limited to, the following : .Own productionizing models—from tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollba...Show moreLast updated: 22 days ago
    • Promoted
    Site Reliability Engineer - AWS / Azure

    Site Reliability Engineer - AWS / Azure

    Funic TechChennai
    Job Title : Site Reliability Engineer (SRE) Experience Required : 7+ Years Location : Bangalore / Chennai &l...Show moreLast updated: 2 days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Weekday AIChennai, TN, IN
    Quick Apply
    This role is for one of Weekday’s clients.If you thrive in a small, high-energy team and want to play a key role in shaping infrastructure and reliability at scale, this is the place for you.We’re ...Show moreLast updated: 15 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    UplersChennai, IN
    Uplers is hiring for one of the clients.SRE (Oracle Cloud Infrastructure).Remote | Mon–Fri | 10 : 30 AM – 7 : 30 PM IST.Use of personal device required. OCI cloud infrastructure using Terraform and GitL...Show moreLast updated: 26 days ago
    • Promoted
    Reliability Engineer

    Reliability Engineer

    Alp Consulting Ltd.Chennai, Tamil Nadu, India
    Job Title : Reliability Engineer.Qualification : Diploma / BE (Mech.Experience of maintaining the Instruments, Valves, transmitters, Sensors, Control systems (DCS / PLC, SCADA), Analyzers and F &G system...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer - Chaos Management

    Site Reliability Engineer - Chaos Management

    Xebiachennai, tamil nadu, in
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 9 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConcordChennai, IN
    Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 19 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Chennai, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Poshmark - Senior Site Reliability Engineer - Cloud Infrastructure

    Poshmark - Senior Site Reliability Engineer - Cloud Infrastructure

    POSHMARKChennai
    Job Description : Were looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems ...Show moreLast updated: 20 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    BayOne Solutionschennai, tamil nadu, in
    Role : Site Reliability Engineer.The CXE Site Reliability Engineering (SRE) team manages the CI / CD pipelines and cloud infrastructure, ensuring seamless deployment, monitoring, and maintenance.Howev...Show moreLast updated: 1 day ago
    • Promoted
    Veryon - Technical Lead / Site Reliability Engineer - AWS Platform

    Veryon - Technical Lead / Site Reliability Engineer - AWS Platform

    VeryonChennai
    Description : Why We Need You The Mission & Our Vision : Veryon is a leading software and technology company that enables...Show moreLast updated: 21 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tata Consultancy ServicesChennai, Tamil Nadu, India
    TCS is looking for Senior Site Reliability Engineer – AWS.Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS. Develop and improve CI / CD pipelines, Infrastru...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer - Cloud Platforms

    Site Reliability Engineer - Cloud Platforms

    LanceSoft, IncChennai
    Role and Responsibilities : Reporting to Engineering, the Site Reliability Engineer will play a critical role in driving innovation and growth for the Banking Soluti...Show moreLast updated: 20 days ago
    • Promoted
    RELX - Site Reliability Engineer - IAC Terraform

    RELX - Site Reliability Engineer - IAC Terraform

    REED ELSEVIER INDIA (a part of RELX India Pvt Ltd)Chennai
    Job Description : - Lead initiatives to identify and eliminate manual, repetitive tasks through automation and tooling.Develop s...Show moreLast updated: 20 days ago
    • Promoted
    Senior DevOps / Site Reliability Engineer

    Senior DevOps / Site Reliability Engineer

    Scoop Technologies Pvt LtdChennai
    Job Title : Senior DevOps Engineer / Site Reliability Engineer (SRE) Experience : 5 to 8 Years &...Show moreLast updated: 28 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ElgebraChennai
    Role Overview : We are seeking a highly experienced and technically proficient Site Reliability Engineer (SRE) to join our team in support of our c...Show moreLast updated: 5 days ago
    • Promoted
    Xebia - Senior / Lead / Principal Site Reliability Engineer

    Xebia - Senior / Lead / Principal Site Reliability Engineer

    Xebia IT Architects India Pvt LtdChennai
    Role : Site Reliability Engineer Experience Range : 7 - 12 Years Location : Pune & Chennai, Bangalore , Gurgaon Mode of Work : Hyb...Show moreLast updated: 30+ days ago