Talent.com
LLM Reliability & Evaluation Engineer

LLM Reliability & Evaluation Engineer

ConfidentialNagar, Sahibzada Ajit Singh Nagar, India
5 days ago
Job description

About Xenonstack

XenonStack is the fastest-growing Data and AI Foundry for Agentic Systems , enabling enterprises to gain real-time and intelligent business insights .

We Deliver Innovation Through

  • Agentic Systems for AI Agents → akira.ai
  • Vision AI Platform → xenonstack.ai
  • Inference AI Infrastructure for Agentic Systems → nexastack.ai

Our mission is to accelerate the world's transition to AI + Human Intelligence by making AI agents reliable, explainable, and enterprise-ready .

THE OPPORTUNITY

We are seeking an LLM Reliability & Evaluation Engineer to ensure that large language models (LLMs) and agentic AI systems meet enterprise-grade standards of accuracy, safety, and trustworthiness .

This role focuses on evaluating, benchmarking, and stress-testing LLMs in real-world workflows, building frameworks for reliability, robustness, and continuous improvement . If you thrive at the intersection of AI research, applied testing, and responsible deployment , this is the role for you.

Key Responsibilities

  • Evaluation Frameworks
  • Design and implement LLM evaluation pipelines covering accuracy, robustness, safety, and bias.
  • Develop automated systems for benchmarking models on enterprise-relevant tasks.
  • Reliability Engineering
  • Conduct stress tests, adversarial testing, and edge-case evaluations.
  • Build tools to measure latency, consistency, and error recovery in multi-turn interactions.
  • Metrics & Monitoring
  • Define KPIs such as factual accuracy, hallucination rate, toxicity, and compliance alignment.
  • Establish real-time monitoring for drift, anomalies, and performance regressions.
  • Collaboration & Alignment
  • Partner with ML engineers, product managers, and domain experts to align evaluation with business objectives.
  • Work with Responsible AI teams to implement ethical, explainable, and compliant evaluation practices.
  • Continuous Improvement
  • Feed insights from evaluation into fine-tuning, RLHF / RLAIF pipelines, and model selection.
  • Maintain a central repository of test cases, benchmarks, and evaluation results.
  • Research & Innovation
  • Stay current with state-of-the-art LLM evaluation techniques, from academic benchmarks to applied enterprise metrics.
  • Explore automated evaluation using agentic test harnesses and synthetic data generation.
  • Skills & Qualifications

    Must-Have

  • 3–6 years in AI / ML, NLP, or applied model evaluation.
  • Strong understanding of LLM architectures, prompt engineering, and failure modes.
  • Hands-on with evaluation frameworks (Eval harnesses, Ragas, OpenAI Evals, DeepEval).
  • Proficiency in Python and libraries like LangChain, LangGraph, LlamaIndex, Hugging Face.
  • Experience with vector databases, RAG pipelines, and knowledge graph integration.
  • Familiarity with bias / fairness testing and Responsible AI frameworks.
  • Good-to-Have

  • Experience with reinforcement learning (RLHF, RLAIF) and reward modeling.
  • Exposure to agentic evaluation frameworks (multi-agent stress testing, synthetic user simulators).
  • Knowledge of compliance and safety requirements for BFSI, GRC, or SOC use cases.
  • Contributions to open-source evaluation libraries or research papers.
  • WHY SHOULD YOU JOIN US

  • Agentic AI Product Company
  • Ensure reliability in cutting-edge AI platforms that are redefining enterprise adoption.

  • A Fast-Growing Category Leader
  • Be part of one of the fastest-growing AI Foundries , powering Fortune 500 enterprises with trustworthy AI.

  • Career Mobility & Growth
  • Grow into roles such as AI Systems Architect, Responsible AI Engineer, or Reliability Engineering Lead .

  • Global Exposure
  • Work on enterprise-scale evaluation challenges across BFSI, Healthcare, Telecom, and GRC.

  • Create Real Impact
  • Your evaluations will directly shape production-grade AI agents used in mission-critical systems .

  • Culture of Excellence
  • Our values — Agency, Taste, Ownership, Mastery, Impatience, and Customer Obsession — empower you to innovate fearlessly.

  • Responsible AI First
  • Join a company that prioritizes trustworthy, explainable, and compliant AI .

    XENONSTACK CULTURE – JOIN US & MAKE AN IMPACT!

    At XenonStack, we believe in shaping the future of intelligent systems . We foster a culture of cultivation built on bold, human-centric leadership principles, where deep work, simplicity, and adoption define everything we do.

    Our Cultural Values

  • Agency – Be self-directed and proactive.
  • Taste – Sweat the details and build with precision.
  • Ownership – Take responsibility for outcomes.
  • Mastery – Commit to continuous learning and growth.
  • Impatience – Move fast and embrace progress.
  • Customer Obsession – Always put the customer first.
  • Our Product Philosophy

  • Obsessed with Adoption – Making AI accessible, reliable, and enterprise-ready.
  • Obsessed with Simplicity – Turning complex evaluation challenges into seamless, automated frameworks.
  • Be part of our mission to accelerate the world's transition to AI + Human Intelligence — by making AI agents not just powerful, but trustworthy and reliable .

    Skills Required

    Python

    Create a job alert for this search

    Reliability Engineer • Nagar, Sahibzada Ajit Singh Nagar, India

    Related jobs
    • Promoted
    Senior Site Reliability Engineer- ELK Expert

    Senior Site Reliability Engineer- ELK Expert

    iVedha Inc.baddi, himachal pradesh, in
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 30+ days ago
    • Promoted
    Senior MLOps Engineer

    Senior MLOps Engineer

    Mitchell Martin Inc.baddi, himachal pradesh, in
    Include, but are not limited to, the following : .Own productionizing models—from tracked experiments to governed releases—ensuring resilient services with clear SLOs, runbooks, and fast, safe rollba...Show moreLast updated: 30+ days ago
    • Promoted
    MLOps Lead Engineer

    MLOps Lead Engineer

    Recropanchkula, haryana, in
    Experience with Azure services such as Azure AI services, Azure Search, Azure ML, Databricks, Azure Kubernetes Service, and AWS services like AWS SageMaker, AWS Bedrock and AWS Lambda.Exposure to G...Show moreLast updated: 21 days ago
    • Promoted
    Senior Site Reliability Engineer / Senior Cloud Engineer

    Senior Site Reliability Engineer / Senior Cloud Engineer

    CloudHirebaddi, himachal pradesh, in
    The Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture.Repo...Show moreLast updated: 23 hours ago
    • Promoted
    AI / ML & Data Engineer

    AI / ML & Data Engineer

    Mindfire Solutionsbaddi, himachal pradesh, in
    We are looking for an experienced AI / ML & Data Engineer to design, develop, and deploy scalable machine learning models and data infrastructure on AWS. You will work closely with cross-functional te...Show moreLast updated: 13 days ago
    • Promoted
    Delinea Implementation Engineer

    Delinea Implementation Engineer

    K&K Talents - Indiapanchkula, haryana, in
    This position is with one of our.Title : Delinea Implementation Engineer.Employment Type : Full-time Permanent.Delinea Implementation Engineer. Delinea (formerly Thycotic & Centrify) Privileged Access...Show moreLast updated: 13 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CodeKarmapanchkula, haryana, in
    Site Reliability Engineer (Multi-Cloud Deployments).CodeKarma is redefining how engineering teams understand and evolve complex systems — bringing production context directly into the developer’s w...Show moreLast updated: 21 days ago
    • Promoted
    Lead - Cloud Reliability Engineer

    Lead - Cloud Reliability Engineer

    Searce Incbaddi, himachal pradesh, in
    The ‘process-first’ AI-native modern tech consultancy that's rewriting the rules.As an engineering-led consultancy, we are dedicated to relentlessly improving the real business outcomes.Our solvers...Show moreLast updated: 30+ days ago
    • Promoted
    ML Ops

    ML Ops

    EXLpanchkula, haryana, in
    Deploy, monitor, and scale ML models on.GCP (Vertex AI, GKE, Cloud Functions).GitHub Actions / Jenkins / cloud-native tools. Containerize and orchestrate workloads with.MLflow, Feast, Prometheus / Gra...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer- Elk Expert

    Senior Site Reliability Engineer- Elk Expert

    iVedha Inc.Panchkula, Republic Of India, IN
    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering Practice.Must be available to work in the EST (US / Canada) Time Zone. Are you a Senior Site Reliability Engineer (SRE) with ...Show moreLast updated: 16 days ago
    • Promoted
    Machine Learning Engineer

    Machine Learning Engineer

    INSPYR Solutionsbaddi, himachal pradesh, in
    MLOps Engineer II ( Mid-Senior-Level).Remote (Night Shift – 10 PM to 7 AM CST).Proficient MLOps engineer capable of independently managing production model deployments, pipelines, and infrastructur...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Capgeminipanchkula, haryana, in
    Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues...Show moreLast updated: 11 days ago
    • Promoted
    Senior Site Reliability Engineer

    Senior Site Reliability Engineer

    Nebula Tech Solutionsbaddi, himachal pradesh, in
    SRE team supporting mission-critical applications for our.We’re now looking for engineers who can go beyond operations — those who can. Enhance application reliability through code.Add or modify cod...Show moreLast updated: 1 day ago
    • Promoted
    Data Integration & LLM Engineer

    Data Integration & LLM Engineer

    Chargebeepanchkula, haryana, in
    We are seeking a highly motivated.This role is ideal for engineers who enjoy working at the intersection of.APIs, SaaS connectors, and ETL / ELT pipelines to ensure reliable and scalable data flows.B...Show moreLast updated: 30+ days ago
    • Promoted
    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Senior Site Reliability Engineer (SRE) – Datadog Observability

    Jade Globalbaddi, himachal pradesh, in
    Senior Site Reliability Engineer (SRE) – Datadog Observability.SRE and Infrastructure Operations with minimum 3.Hyderabad preferable but open for Pune and remote. Site Reliability Engineer (SRE).SRE...Show moreLast updated: 1 day ago
    • Promoted
    • New!
    Lead Site Reliability Engineer

    Lead Site Reliability Engineer

    Futurism Technologies, INC.baddi, India
    Site Reliability Engineering (SRE) Lead.We are seeking a highly skilled and experienced.You will lead a team responsible for building and maintaining automated deployment pipelines, infrastructure ...Show moreLast updated: 21 hours ago
    • Promoted
    LLM Engineer

    LLM Engineer

    Insight Globalpanchkula, haryana, in
    Insight Global is sourcing for an AI / LLM Engineer to sit remotely in India, joining a global consulting firm.This position will support various Digital Products Teams in the AI Center of Excellence...Show moreLast updated: 1 day ago
    • Promoted
    PLM Teamcenter Data Engineer Siemens

    PLM Teamcenter Data Engineer Siemens

    MSR Technology Grouppanchkula, haryana, in
    Solid knowledge of Teamcenter OOTB utilities such as CSV2TCXML, TCXML_IMPORT, and TCIN_IMPORT, with the ability to explore and utilize configuration options effectively. Proven expertise in troubles...Show moreLast updated: 1 day ago