Talent.com
Machine Learning Observability Platform Engineer

Machine Learning Observability Platform Engineer

Mewar Infotech LimitedErode, IN
19 hours ago
Job description

We’re looking for a Machine Learning Observability Platform Engineer who’s passionate about building large-scale, reliable ML systems. You’ll help design and enhance our open-source observability platform , adding AI capabilities that power critical insights across enterprise environments.

What You’ll Do

  • Build and maintain AI / ML features for an open-source Observability Platform built on Grafana and ClickHouse .
  • Collaborate with SREs, service owners, and observability SMEs to ensure scalable, reliable ML model deployment.
  • Design and manage data pipelines using Databricks and related tools.
  • Use CI / CD and MLOps best practices to automate model deployment and testing.
  • Deploy and manage ML infrastructure on AWS or Azure .
  • Set up and integrate MCP servers and connect tools across observability systems.
  • Establish prompt standards and develop custom MCP integrations between systems.
  • Troubleshoot ML system performance and reliability using OpenTelemetry pipelines and observability metrics.

What We’re Looking For

  • Master’s degree in Computer Science, Engineering, or Artificial Intelligence (or equivalent experience).
  • Proven experience designing, developing, and operating ML systems in production .
  • Hands-on experience with LLMs / MCPs , Grafana , and Databricks .
  • Strong coding skills in Python .
  • Familiarity with Kubernetes , container orchestration , and cloud platforms (AWS / Azure).
  • Solid understanding of observability pillars (metrics, logs, traces).
  • Experience implementing OpenTelemetry pipelines for ML systems.
  • Knowledge of CI / CD , MLOps , and monitoring best practices .
  • Create a job alert for this search

    Machine Learning Engineer • Erode, IN