The Job in short As a a Principal AI Evaluation Engineer you will be leading the evaluation efforts in our AI-powered SDLC team. You will own the evaluation strategy for AI assistants and agentic workflows, ensuring they are reliable, observable, and safeguarded with strong guardrails. Beyond hands-on work, you will mentor engineers, lead triage and reporting, and make evaluation a cornerstone of release decisions. Meet the job
- Define and lead the evaluation strategy and roadmap for AI-powered SDLC core product
- Build and oversee evaluation pipelines and guardrails.
- Build and maintain evaluation datasets (synthetic and real project data) to benchmark AI behavior.
- Analyze evaluation results, identify gaps, and produce clear, actionable reports for engineering and product stakeholders.
- Build a culture of innovation and excellence, encouraging continuous improvement and adoption of best practices in AI evaluation and deployment.
- Collaborate with cross-functional teams to integrate evaluation insights into development. How about you
- Strong understanding of software engineering principles and the software development lifecycle (SDLC).
- Hands-on experience with test design, test management, observability, and data analysis.
- Proficiency in Python (or another scripting language) for automating evaluations.
- Familiarity with AI Agent evaluation methods (faithfulness, answer relevancy, contextual accuracy, tool correctness).
- Excellent analytical and problem-solving skills.
- Strong communication and collaboration abilities, able to work with cross-functional teams and stakeholders.