Talent.com
This job offer is not available in your country.
MLOps Site Reliability Engineer

MLOps Site Reliability Engineer

KLAChennai, Tamil Nadu, India
30+ days ago
Job description

Description

/ Preferred Qualifications

We are seeking a highly skilled and motivated MLOps Site Reliability Engineer (SRE) to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our machine learning infrastructure. You will work closely with data scientists, machine learning engineers, and software developers to build and maintain robust and efficient systems that support our machine learning workflows. This position offers an exciting opportunity to work on cutting-edge technologies and make a significant impact on our organization's success.

Responsibilities :

  • Design, implement, and maintain scalable and reliable machine learning infrastructure.
  • Collaborate with data scientists and machine learning engineers to deploy and manage machine learning models in production.
  • Develop and maintain CI / CD pipelines for machine learning workflows.
  • Monitor and optimize the performance of machine learning systems and infrastructure.
  • Implement and manage automated testing and validation processes for machine learning models.
  • Ensure the security and compliance of machine learning systems and data.
  • Troubleshoot and resolve issues related to machine learning infrastructure and workflows.
  • Document processes, procedures, and best practices for machine learning operations.
  • Stay up-to-date with the latest developments in MLOps and related technologies.

Required Qualifications :

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Site Reliability Engineer (SRE) or in a similar role.
  • Strong knowledge of machine learning concepts and workflows.
  • Proficiency in programming languages such as Python, Java, or Go.
  • Experience with cloud platforms such as AWS, Azure, or Google Cloud.
  • Familiarity with containerization technologies like Docker and Kubernetes.
  • Experience with CI / CD tools such as Jenkins, GitLab CI, or CircleCI.
  • Strong problem-solving skills and the ability to troubleshoot complex issues.
  • Excellent communication and collaboration skills.
  • Preferred Qualifications :

  • Master's degree in Computer Science, Engineering, or a related field.
  • Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn.
  • Knowledge of data engineering and data pipeline tools such as Apache Spark, Apache Kafka, or Airflow.
  • Experience with monitoring and logging tools such as Prometheus, Grafana, or ELK stack.
  • Familiarity with infrastructure as code (IaC) tools like Terraform or Ansible.
  • Experience with automated testing frameworks for machine learning models.
  • Knowledge of security best practices for machine learning systems and data.
  • Minimum Qualifications

    Master's / Bachelor's Level Degree and related work experience of 2 years

    Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.

    Create a job alert for this search

    Site Reliability Engineer • Chennai, Tamil Nadu, India

    Related jobs
    • Promoted
    • New!
    Site Reliability Engineers (SREs) - Robust background in Google Cloud Platform (GCP) | RedHat OpenShift administration

    Site Reliability Engineers (SREs) - Robust background in Google Cloud Platform (GCP) | RedHat OpenShift administration

    UPS IndiaChennai, Tamil Nadu, India
    Explore your next opportunity at a Fortune Global 500 organization.Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better ever...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    Senior Site Reliability Engineer I

    Senior Site Reliability Engineer I

    RELXChennai, Tamil Nadu, India
    LexisNexis Risk Solutions is looking for a Senior SRE / DevSecOps Engineer to join our collaborative and innovative SRE team. In this role, you’ll help design, build, and maintain secure, scalable s...Show moreLast updated: 11 hours ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    PoshmarkChennai, Tamil Nadu, India
    We’re looking for an experienced Site Reliability Engineer to fill the mission-critical role of ensuring that our complex, web-scale systems are healthy, monitored, automated, and designed to scale...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ExasoftChennai, IN
    Responsibilities and Requirements : .Experience must be at least 10+ years in SRE.Multi Cloud, Hybrid Cloud – on Data center sites. Experience with multiple operating systems (.Operating Systems, Kern...Show moreLast updated: 3 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    XebiaChennai, IN
    AWS Engineer with strong Python development and Chaos Engineering expertise.The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault toler...Show moreLast updated: 30+ days ago
    • Promoted
    Senior MLOps Engineer

    Senior MLOps Engineer

    Mitchell Martin Inc.Chennai, IN
    Include, but are not limited to, the following : .Own productionizing models—from tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollba...Show moreLast updated: 23 days ago
    • Promoted
    • New!
    Middle Site Reliability Engineer

    Middle Site Reliability Engineer

    MiratechChennai, Tamil Nadu, India
    Our client is a global technology company with a complex microservices environment and a strong focus on system observability and reliability. Build, automate, and maintain dashboards to monitor 30+...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    Site Reliability Engineer III

    Site Reliability Engineer III

    RELXChennai, Tamil Nadu, India
    We are seeking a Site Reliability Engineer (SRE) with experience in Azure and a track record of success in cloud migration project initiatives. The successful candidate will help design and coordina...Show moreLast updated: 11 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ConcordChennai, IN
    Engineers (Individual Contributors).Strong SRE (Site Reliability Engineering).CI / CD, monitoring, automation, infrastructure as code, etc.Show moreLast updated: 21 days ago
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.Chennai, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    TrimbleChennai, Tamil Nadu, India
    We are seeking a motivated Site Reliability Engineer (SRE) Level 1 to enhance the infrastructure and operational reliability of our ERP product, specifically within Azure and Windows environments.T...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    Site Reliability Engineers - Google Cloud Platform (GCP) | RedHat OpenShift administration

    Site Reliability Engineers - Google Cloud Platform (GCP) | RedHat OpenShift administration

    UPS IndiaChennai, Tamil Nadu, India
    Explore your next opportunity at a Fortune Global 500 organization.Envision innovative possibilities, experience our rewarding culture, and work with talented teams that help you become better ever...Show moreLast updated: 11 hours ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Tata Consultancy ServicesChennai, Tamil Nadu, India
    TCS is looking for Senior Site Reliability Engineer – AWS.Design, implement, and maintain scalable, secure, and highly available infrastructure on AWS. Develop and improve CI / CD pipelines, Infrastru...Show moreLast updated: 8 days ago
    • Promoted
    • New!
    Senior Site reliability Engineer

    Senior Site reliability Engineer

    RELXChennai, Tamil Nadu, India
    We’re looking for an experienced Site Reliability Engineer (SRE) to join our team.In this role, you’ll work on meaningful projects that improve the reliability, performance, and efficiency of our s...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    Senior Site Reliability Engineer II

    Senior Site Reliability Engineer II

    RELXChennai, Tamil Nadu, India
    DevOps / Site Reliability Engineer (SRE).Whether your background is software engineering or SRE-focused, what matters most is your ability to automate, optimize, and improve systems through smart scr...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    Site Reliability Operations Engineer - India

    Site Reliability Operations Engineer - India

    FinalsiteChennai, Tamil Nadu, India
    Finalsite is the preferred website, communications, enrollment, and marketing platform of more than 7,000 schools and school districts in 119 countries around the world. The company's people, produc...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Weekday AIChennai, Tamil Nadu, India
    This role is for one of Weekday’s clients.If you thrive in a small, high-energy team and want to play a key role in shaping infrastructure and reliability at scale, this is the place for you.We’re ...Show moreLast updated: 11 hours ago
    • Promoted
    • New!
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    SaamaChennai, Tamil Nadu, India
    Job Title : Senior Site Reliability Engineer.We are seeking a highly motivated and experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ...Show moreLast updated: 11 hours ago