Talent.com
This job offer is not available in your country.
AI Operations Engineer

AI Operations Engineer

GSPANNhyderabad, India
7 hours ago
Job description

Description GSPANN is hiring an AI Operations Engineer. The role focuses on deploying ML models, automating CI / CD pipelines, and implementing AIOps solutions.

Role and Responsibilities

  • Build, automate, and manage continuous integration and continuous deployment (CI / CD) pipelines for machine learning (ML) models.
  • Partner with data scientists to transition ML models from experimentation to production environments.
  • Use tools such as Docker, Kubernetes, MLflow, or Kubeflow to deploy, monitor, and maintain scalable ML systems.
  • Implement systems for model versioning, model drift detection, and performance tracking.
  • Maintain reproducibility and traceability of ML experiments and outputs.
  • Design and implement AIOps frameworks to enable predictive monitoring, anomaly detection, and intelligent alerting.
  • Leverage observability tools, including Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana), Dynatrace, and Datadog, to collect and analyze infrastructure and application metrics.
  • Utilize machine learning and statistical methods to identify patterns, automate root cause analysis, and forecast system behavior.
  • Write and maintain automation scripts that support remediation and incident response.

Skills and Experience

  • Bachelor's or Master's degree in Computer Science, Data Science, Information Technology, or a related discipline.
  • Certification in DevOps or MLOps from AWS or GCP is preferred.
  • Understand Site Reliability Engineering (SRE) practices and metrics such as Service-Level Agreements (SLAs), Service-Level Indicators (SLIs), and Service-Level Objectives (SLOs).
  • Demonstrate strong programming proficiency in Python.
  • Work with ML lifecycle platforms such as MLflow, Kubeflow, TensorFlow Extended (TFX), and Data Version Control (DVC).
  • Use Docker and Kubernetes for containerization and orchestration.
  • Employ CI / CD tools including GitHub Actions, Jenkins, and GitLab CI / CD.
  • Operate monitoring and logging systems like Prometheus, Grafana, ELK, Datadog, and Splunk.
  • Possess hands-on experience with cloud platforms such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform (GCP).
  • Apply DevOps principles effectively and use infrastructure-as-code tools like Terraform and Ansible.
  • Handle projects involving Natural Language Processing (NLP), time-series forecasting, or anomaly detection models.
  • Build and manage large-scale distributed computing environments.
  • Create a job alert for this search

    Ai Engineer • hyderabad, India