At Vaikhari AI , we are building the trust layer for AI . Our mission is to ensure AI systems behave reliably across languages, cultures, and real-world conditions. We design benchmarks, datasets, and tools that enable enterprises and researchers to measure, stress-test, and improve their AI systems with a special focus on underrepresented and low-resource languages.
Our founders are researchers and operators from BigTech and Unicorn startups, experienced in shipping at enterprise scale while executing with startup agility.
Role Description
We are looking for a Prompt Engineer with a strong technical background who deeply understands LLM behaviour and can design, test, and exploit emergent patterns in multi-modal and generative AI systems . This role combines research, experimentation, and engineering ideal for someone who can move seamlessly between conceptual design, empirical validation, and real-world implementation.
You will work closely with our AI scientists and data teams to design robust, generalizable prompt strategies , study model reasoning behaviour, and create evaluation setups that stress-test AI capabilities in complex or multilingual environments.
Responsibilities
- Design, test, and refine prompts and prompt pipelines for text and multi-modal models.
- Investigate and document emergent behaviours and their reproducibility across model families and scales.
- Conduct structured experiments to understand how prompt structure, context, and modality influence model outputs.
- Collaborate with research and engineering teams to build evaluation and trust metrics for generative AI systems.
- Develop reusable prompt frameworks and templates for diverse tasks and domains.
- Drive experiments independently, from hypothesis to insight, in ambiguous or rapidly evolving problem spaces.
What We’re Looking For
5+ years of experience in AI / ML and at least 2+ years in Generative AI or LLM-based systems .Strong understanding of LLM internals, emergent behaviours, and reasoning limitations .Experience with prompt engineering for text or multi-modal models (e.g., GPT, Claude, Gemini, LLaVA, or similar).Proficiency in Python and comfort with frameworks such as LangChain, LlamaIndex, or OpenAI SDKs .Familiarity with evaluation, data curation, and fine-tuning processes for LLMs.Ability to work independently, design experiments, and derive insights from unstructured exploration.Curiosity about language, cognition, and human-AI interaction .Why Join Us?
Shape the trust layer for AI , working on problems that matter globally.Be part of a founding-stage deeptech startup with exposure to both cutting-edge research and client-facing impact .Work directly with a leadership team that has shipped at BigTech scale and built with startup speed .Influence how the world evaluates, trusts, and deploys AI systems .