This job offer is not available in your country.

LLM Evaluation Engineer (GenAI QE)

Eucloid Data SolutionsGurugram, Haryana, India

16 days ago

Job description

Note :

Please apply only if you have

6 years or more of relevant experience (excluding internship)
Comfortable working 5-days a week from Gurugram, Haryana
Are an immediate joiner or currently serving your notice period

About Eucloid

At Eucloid, innovation meets impact. As a leader in AI and Data Science, we create solutions that redefine industries—from Hi-tech and D2C to Healthcare and SaaS. With partnerships with giants like Databricks, Google Cloud, and Adobe, we’re pushing boundaries and building next-gen technology.

Join our talented team of engineers, scientists, and visionaries from top institutes like IITs, IIMs, and NITs. At Eucloid, growth is a promise, and your work will drive transformative results for Fortune 100 clients.

What You’ll Do

Design and implement robust frameworks for evaluating large language models (LLMs) across dimensions like accuracy, safety, hallucination, and reasoning.

Build modular pipelines for automated, semi-automated, and human-in-the-loop evaluations.

Integrate GenAI testing tools such as Giskard, RAGAS, DeepEval, TruLens, Opik / Comet, and LangSmith.

Define and implement custom evaluation metrics tailored to use cases like RAG, agents, and safety guardrails.

Curate or generate high-quality evaluation datasets across domains (e.g., legal, medical, QA, coding).

Collaborate with developers to instrument tracing and logging for real-world model behavior capture.

Build dashboards and reporting mechanisms to visualize performance, regressions, and model comparisons.

Conduct prompt-based testing, chain-of-thought evaluations, adversarial testing, and A / B comparisons.

Contribute to red-teaming and stress-testing efforts to uncover vulnerabilities and ethical risks.

What Makes You a Fit

Academic Background :

Bachelor’s or Master’s degree in Computer Science, Data Science, Artificial Intelligence, or a related field.

Technical Expertise :

Minimum 6 years of hands-on experience in building, testing, or evaluating AI / ML systems , with a strong focus on LLMs or Generative AI applications.

Proficiency in Python , along with experience using ML / NLP libraries such as Hugging Face, LangChain, OpenAI SDK, or Cohere.

Experience in building evaluation pipelines or benchmarks for LLM performance across metrics like accuracy, robustness, safety, and hallucination.

Deep understanding of prompt engineering , retrieval-augmented generation (RAG) , and agentic evaluation techniques.

Hands-on familiarity with evaluation tools such as Giskard, RAGAS, DeepEval, TruLens, LangSmith, Opik / Comet, or similar.

Working knowledge of vector databases like FAISS, Pinecone, or Weaviate, and embedding-based evaluation methods.

Experience with CI / CD pipelines , unit / integration testing for LLM apps, and model versioning for reproducibility.

Ability to define custom evaluation metrics tailored to specific use cases (e.g., RAG performance, guardrail compliance, hallucination detection).

Strong grasp of model instrumentation techniques for tracing / logging model behavior in real-world flows.

Extra Skills :

Experience in developing LLM-based applications such as chatbots, copilots, or RAG systems.

Exposure to designing or evaluating AI safety systems (e.g., jailbreaking prevention, content filters).

Open-source contributions to GenAI tooling or evaluation libraries.

Strong communication and documentation skills.

Comfort working in fast-paced, research-heavy environments.

Why You’ll Love It Here

Innovate with the Best Tech : Work on groundbreaking projects using AI, GenAI, LLMs, and massive-scale data platforms. Tackle challenges that push the boundaries of innovation.

Impact Industry Giants : Deliver business-critical solutions for Fortune 100 clients across Hi-tech, D2C, Healthcare, SaaS, and Retail. Partner with platforms like Databricks, Google Cloud, and Adobe to create high-impact products.

Collaborate with a World-Class Team : Join exceptional professionals from IITs, IIMs, NITs, and global leaders like Walmart, Amazon, Accenture, and ZS. Learn, grow, and lead in a team that values expertise and collaboration.

Accelerate Your Growth : Access our Centres of Excellence to upskill and work on industry-leading innovations. Your professional development is a top priority.

Work in a Culture of Excellence : Be part of a dynamic workplace that fosters creativity, teamwork, and a passion for building transformative solutions. Your contributions will be recognized and celebrated.

About Our Leadership

Anuj Gupta – Former Amazon leader with over 22 years of experience in building and managing large engineering teams. (B.Tech, IIT Delhi; MBA, ISB Hyderabad).

Raghvendra Kushwah – Business consulting expert with 21+ years at Accenture and Cognizant (B.Tech, IIT Delhi; MBA, IIM Lucknow).

Key Benefits

Competitive salary and performance-based bonus.

Comprehensive benefits package, including health insurance and flexible work hours.

Opportunities for professional development and careers growth.

Location : Gurugram

Submit your resume to saurabh.bhaumik@eucloid.com with the subject line “ Application : Role Name. ”

Eucloid is an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive environment.

Create a job alert for this search

Genai Engineer • Gurugram, Haryana, India