Talent.com
Dexian India
AI/ML Observability EngineerDexian India • Delhi, India
No longer accepting applications
AI/ML Observability Engineer

AI/ML Observability Engineer

Dexian India • Delhi, India
23 days ago
Job description
Overview We are seeking a passionate and hands-on AI/ML Engineer to accelerate our Enterprise Observability strategy. This role will design, build, and operationalize AI/ML capabilities that enhance end to end telemetry pipelines, anomaly detection, intelligent alerting, and proactive system resiliency. You will work at the intersection of AI/ML engineering, Observability platforms, and automation, developing solutions that improve detection, diagnosis, and prevention of operational issues across distributed systems. ________________________________________ Key Responsibilities • Design and deploy AI/ML models supporting anomaly detection, baselining, event correlation, and predictive operational analytics. • Build and integrate AI‑enabled capabilities into enterprise Observability platforms, including Grafana, APM/RUM tools, network telemetry systems, and data observability tools. • Develop AI Agents that can autonomously triage issues, recommend corrective actions, and initiate automated remediation workflows to reduce recovery time and improve system resilience. • Implement self‑healing automation using AI‑driven decisioning, integrating with orchestration frameworks, service APIs, and infrastructure automation pipelines. • Engineer and maintain real‑time and batch data pipelines using Snowflake ML Jobs, Snowflake Cortex, streams, tasks, and UDFs. • Implement and manage OpenTelemetry‑based telemetry ingestion for logs, metrics, traces, and spans across distributed systems. • Build asynchronous Python APIs and services for model inferencing and operational integration. • Enhance observability intelligence with AI-powered capabilities such as root‑cause acceleration, chatbot/search enablement, and automated insights. • Contribute to SLO/SLI modeling, Golden Signals instrumentation, and Observability NFR adoption. • Collaborate across engineering, SRE, platform and business teams to embed proactive intelligence and Observability standards throughout the ecosystem.

Required Skills & Qualifications Core Technical Skills • Strong proficiency in Python and data science/ML libraries: NumPy, Pandas, scikit learn, TensorFlow, PyTorch, Matplotlib, Seaborn. • Experience with Generative AI, LLM fine tuning, prompt engineering, RAG pipelines, and LLM evaluation frameworks. • Expertise in developing and deploying ML models in production (batch & streaming). • Strong understanding of statistics, time series modeling, and anomaly detection.

Observability & Telemetry • Experience with OpenTelemetry for logs, metrics, traces, spans. • Familiarity with Observability concepts: Golden Signals, SLO/SLI design, APM, RUM, Synthetics, event correlation, baselining. • Experience with Observability tools such as: Grafana (Alloy agents, dashboards, ML capabilities), Dynatrace, Monte Carlo (Data Observability), Netscout, ThousandEyes, SolarWinds, NetBrain.

Cloud, Data & Platform • Hands on with AWS (SageMaker, Bedrock), Snowflake ML, Snowflake/Openflow, Snowflake AI Observability tooling. • Experience building Snowflake data pipelines (streams, tasks, UDFs) – plus for Cortex features. • Strong understanding of distributed systems and microservices telemetry requirements.

Automation & Engineering Quality • Experience with automation pipelines, CI/CD, and infrastructure as code patterns supporting Observability adoption. • Ability to build asynchronous Python APIs or services for model inference and operational integration. ________________________________________ Preferred Qualifications • Experience developing agentic AI systems that analyze telemetry, generate action recommendations, or execute automated operational responses. • Experience building self‑healing patterns, including automated rollback, service restarts, configuration corrections, and predictive maintenance. • Experience in Snowflake ML workflows, Snowflake Cortex Agents, and data pipeline automation. • Exposure to AI-enabled alerting, RCA automation, and operational self‑healing concepts. • Experience with large-scale operational telemetry and multi-cloud ecosystems.

Soft Skills • Strong analytical thinking and problem solving. • Excellent communication skills for cross functional collaboration with infrastructure, SRE, engineering, business, and leadership teams. • Curiosity, continuous learning mindset, and passion for applied AI and Observability.

Create a job alert for this search

AI/ML Observability Engineer • Delhi, India

Similar jobs

AI/ML Engineer (Voice Models, Cloning, TTS, STT, ASR)

Client of Prasha Consultancy Services Private Limiteddelhi, delhi, in

Immediate or early Joiners preferred.A US Based IT MNC is looking for a seasoned AI/ML Engineer with hands-on experience in building and optimizing voice models, for one its Reputed client in Enter... Show more

 • Promoted

Research Engineer – Generative AI (LLMs)

Abacus.AIghaziabad, uttar pradesh, in

AI is an AGI control center from where you can create, deploy, and monitor AI agents.We offer an AI super assistant for enterprises and professionals.We are building a future where AI assists and a... Show more

 • Promoted

Artificial Intelligence Engineer

Kumaran Systemsghaziabad, uttar pradesh, in

Kumaran Systems is a global technology solutions provider delivering innovative IT services and digital transformation solutions to clients worldwide.We foster a collaborative, growth-driven work c... Show more

 • Promoted

Artificial Intelligence Engineer

FinThrivenoida, delhi, in

Join a group of highly talented and motivated data scientists and data engineers with significant healthcare experience to help a thriving Healthcare function.Build a highly functional and efficien... Show more

 • Promoted

Generative AI Engineer

Xceedancenew delhi, delhi, in

We seek a motivated Generative AI Developer to design, implement, and optimize cutting-edge generative AI solutions.You’ll work closely with senior engineers to build applications leveraging LLMs (... Show more

 • Promoted

AI/ML Engineer – Recommendation Systems & Personalisation (Contract to Hire)

Bionesca Ltd.ghaziabad, uttar pradesh, in

By leveraging advanced technologies, Bionesca provides data-driven solutions to deliver personalised skincare recommendations.Our mission is to empower individuals with insights and tools to achiev... Show more

 • Promoted

Gen AI Machine Learning Engineer

Asteknoida, delhi, in

AI and machine learning, out of which 3+ years of experience in machine learning engineering, NLP, Generative AI, and LLM technologies.Experience with LLM Agentic workflows and frameworks (Langchai... Show more

 • Promoted

Generative AI Engineer

Headway Tek Incghaziabad, uttar pradesh, in

We are seeking a highly skilled and innovative AI/ML Engineer to join our team.This role focuses on designing, developing, and deploying cutting-edge solutions using Generative AI (GenAI), Large La... Show more

 • Promoted

Artificial Intelligence Engineer - Generative AI

INFOTRONghaziabad, uttar pradesh, in

This is a 100% remote role but strict background checks will be conducted on candidates to prevent any possibility of moonlighting.The client is based out of US so primarily working according to IS... Show more

 • Promoted

Artificial Intelligence Engineer

System Soft Technologiesnoida, delhi, in

Job Title: Senior AI Developer – IT Productivity (Microsoft Copilot Studio).Duration: 6 Months (Possible Extension).We are seeking a Senior AI Developer – IT Productivity to backfill a key consulta... Show more

 • Promoted

AI Engineer

Sutra.AInew delhi, delhi, in

Our mission is to help enterprises transform raw data into intelligent, actionable insights through AI, automation, and decision intelligence.The ideal candidate is hands-on, detail-oriented, and t... Show more

 • Promoted

Machine Learning Engineer - H&E Staining

Micro Crispr Pvt. Ltd.delhi, delhi, in

Job Title: H&E Image Analysis Scientist / Machine Learning Engineer- Spatial Omics (PhD).We are seeking a motivated PhD candidate interested in machine learning for histopathology.The candidate wil... Show more

 • Promoted

Applied AI/ML Specialist

TECEZENoida, Republic Of India, IN

Duration: 6 Months(Extendable).NLP, recommendation systems, etc.Work with data engineers to prepare and process large datasets.RAG, prompt engineering, vector search).Monitor model performance and ... Show more

 • Promoted

AI/ML Engineer

Mindfire Solutionsnoida, delhi, in

As an AI/ML Engineer, you will be responsible for architecting, developing, and deploying production-grade Generative AI and Agentic AI solutions, including LLM-powered systems, autonomous workflow... Show more

 • Promoted

Azure AI/ML Engineer

Tata Consultancy Servicesdelhi, india, India

Azure AI / Retrieval-augmented generation (RAG).NLP, semantic search, Prompt Flow, embeddings strategy, hybrid search, grounding.Model evaluation, prompt engineering, fine-tuning.Familiarity with A... Show more

 • Promoted

Artificial Intelligence Engineer

Sonata Softwareghaziabad, uttar pradesh, in

Job Description: AI Developer - Generative/LLM 4.We are looking for an experienced developer to modernize an existing agent-based application.The ideal candidate will have strong expertise in.NET C... Show more

 • Promoted

Artificial Intelligence Engineer

Hyqooghaziabad, uttar pradesh, in

Design, develop, and deploy AI models, particularly focusing on.Retrieval-Augmented Generation (RAG).Develop and maintain AI workflows using.Write clean, efficient, and scalable code primarily in.A... Show more

 • Promoted

AI/ML Engineer

PrismHRNoida, Uttar Pradesh, IN
Quick Apply

We are seeking a highly skilled AI/ML Engineer to design, develop, and deploy machine learning models and AI-driven solutions that enhance business processes and product capabilities.The ideal cand... Show more

Remote ML Engineer - 60735

Turingnew delhi, delhi, in
Remote

Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems.Turing support... Show more

 • Promoted

AI Inference Engineer QVAC (100% remote Worldwide)

Tether.ioghaziabad, uttar pradesh, in
Remote

Join Tether and Shape the Future of Digital Finance.At Tether, we’re not just building products, we’re pioneering a global financial revolution.Our cutting-edge solutions empower businesses—from ex... Show more