Who we are
Mindtickle is the market-leading revenue productivity platform that combines on-the-job learning and deal execution to get more revenue per rep. Mindtickle is recognized as a market leader by top industry analysts and is ranked by G2 as the #1 sales onboarding and training product. We're honoured to be recognized as a Leader in the first-ever Forrester Wave™ : Revenue Enablement Platforms, Q3 2024!
What's in it for you
- Own the end-to-end qualification lifecycle for AI / LLM systems from ideation and implementation to CI / CD integration.
- Design and implement scalable automated test suites across unit, integration, regression, and system levels.
- Build and enhance frameworks to test, evaluate, and continuously improve complex AI and LLM workflows.
- Lead the design and automation of LLM-powered features, including prompt pipelines, RAG workflows, and AI-assisted developer tools.
- Develop evaluation pipelines to measure factual accuracy, hallucination rates, bias, robustness, and overall model reliability.
- Define and enforce metrics-driven quality gates and experiment tracking workflows to ensure consistent, data-informed releases.
- Collaborate with agile engineering teams, participating in design discussions, code reviews, and architecture decisions to drive testability and prevent defects early ('shift left').
- Develop monitoring and alerting systems to track LLM production quality, safety, and performance in real time.
- Conduct robustness, safety, and adversarial testing to validate AI behavior under edge cases and stress scenarios.
- Continuously improve frameworks, tools, and processes for LLM reliability, safety, and reproducibility.
- Mentor junior engineers in AI testing, automation, and quality best practices.
- Measure and improve Developer Experience (DevEx) through tools, feedback loops, and automation.
- Champion quality engineering practices across the organization, ensuring delivery meets business goals, user experience, cost of operations etc.
We'd love to hear from you, if you :
LLM testing & evaluation tools : MaximAI, OpenAI Evals, TruLens, Promptfoo, LangSmithBuilding LLM-powered apps : prompt pipelines, embeddings, RAG, AI workflowsCI / CD design for application + LLM testingAPI, performance, and system testingGit, Docker, and cloud platforms (AWS / GCP / Azure)Bias, fairness, hallucination detection & AI safety testingMentorship and cross-functional leadershipPreferred Qualifications
Bachelor's or Master's in Computer Science, Engineering, or equivalent.4+ years in software development, SDET, or QA automation.Proficiency in GoLang, Java, or Python.Proven experience building test automation frameworks.Proven ability to design CI / CD pipelines with automated regression and evaluation testing.Hands-on exposure to LLMs, GenAI applications.2+ years of hands-on experience with LLM APIs and frameworks (OpenAI, Anthropic, Hugging Face).Proficient in prompt engineering, embeddings, RAG, and LLM evaluation metrics.Strong analytical, leadership, and teamwork skills.Excellent communication and collaboration across teams.Skills Required
Docker, Python, Aws, Java, Git, Gcp, Azure, Golang