Our organization is seeking a skilled professional to conduct in-depth analysis and testing of AI-generated content. The ideal candidate will have hands-on experience with Red Teaming, Prompt Evaluation, and Quality Assurance for large language models.
- The primary objective of this role is to rigorously test and evaluate AI-generated content, identifying vulnerabilities and assessing risks to ensure compliance with safety, ethical, and quality standards.
Key Responsibilities :
Conducting rigorous Red Teaming exercises to identify harmful or unsafe outputs from large language models.Evaluating and stress-testing AI prompts across multiple domains, including finance, healthcare, and security.Developing and applying test cases to assess accuracy, bias, toxicity, hallucinations, and misuse potential in AI-generated responses.Collaborating with data scientists, safety researchers, and prompt engineers to report risks and suggest mitigations.Performing manual Quality Assurance and content validation, ensuring factual consistency, coherence, and guideline adherence.Creating evaluation frameworks, scoring rubrics, and prompt performance safety compliance documents.Documenting findings, edge cases, and vulnerability reports in a clear and structured format.Requirements :
Proven experience in AI Red Teaming, LLM safety testing, and adversarial prompt design.Familiarity with prompt engineering, NLP tasks, and ethical considerations in generative AI.Strong background in Quality Assurance and content review, including test case development for AI / ML systems.Understanding of LLM behavior, failure modes, and model evaluation metrics.Excellent critical thinking, pattern recognition, and analytical writing skills.Ability to work independently and follow detailed evaluation protocols while meeting tight deadlines.Preferred Qualifications :
Prior work experience in teams focused on Open AI, Anthropic, Google DeepMind, or other leading LLM safety initiatives.Experience in risk assessment, Red Team security testing, and AI policy governance.