Job Description :
We are seeking highly analytical and detail-oriented professionals with hands-on experience in Red Teaming, Prompt Evaluation , and AI / LLM Quality Assurance . The ideal candidate will help us rigorously test and evaluate AI-generated content to identify vulnerabilities, assess risks, and ensure compliance with safety, ethical, and quality standards.
Job Title : - Red Teaming, Prompt Evaluation, or AI / LLM Quality Assurance Expert
Key Responsibilities :
- Conduct Red Teaming exercises to identify adversarial, harmful, or unsafe outputs from large language models (LLMs).
- Evaluate and stress-test AI prompts across multiple domains (e.g., finance, healthcare, security) to uncover potential failure modes.
- Develop and apply test cases to assess accuracy, bias, toxicity, hallucinations , and misuse potential in AI-generated responses.
- Collaborate with data scientists, safety researchers, and prompt engineers to report risks and suggest mitigations.
- Perform manual QA and content validation across model versions, ensuring factual consistency, coherence, and guideline adherence.
- Create evaluation frameworks and scoring rubrics for prompt performance and safety compliance.
- Document findings, edge cases, and vulnerability reports with high clarity and structure.
Requirements :
Proven experience in AI red teaming , LLM safety testing, or adversarial prompt design.Familiarity with prompt engineering , NLP tasks, and ethical considerations in generative AI.Strong background in Quality Assurance , content review, or test case development for AI / ML systems.Understanding of LLM behaviour's, failure modes, and model evaluation metrics.Excellent critical thinking, pattern recognition, and analytical writing skills.Ability to work independently, follow detailed evaluation protocols, and meet tight deadlines.Preferred Qualifications :
Prior work with teams like Open AI, Anthropic, Google DeepMind, or other LLM safety initiatives.Experience in risk assessment, red team security testing, or AI policy & governance.Background in linguistics, psychology, or computational ethics is a plus.
Kindly Attempt the below assessment to proceed further with your application.
https : / / icap.innodata.com / registerfreelancer?enc=oUTZVsr / Pnz / 0Xygc2EK32MdtinqnjC9vy8RU3Ha4EMfQdYWF8n1oAiUtOZXMcKhhhsYYecqmKWja8eKXV801gezikielezikiel