About the Role :
We are looking for native US English speakers with experience in working with Large Language Models (LLMs). The project involves watching YouTube videos and generating 2 to 4 complex queries per video along with golden answers — these should be questions that advanced models like Gemini 2.5 Pro are unlikely to answer correctly.
Project Description :
- Review YouTube videos of varying lengths
- Create 2–4 complex queries per video
- Provide “golden answers” (high-quality, ground truth answers)
- Focus is on creating edge cases that challenge advanced LLMs
- Each video task is estimated to take ~30 minutes
- Duration : 4 weeks
- Weekend work may be required.
Preferred Qualifications :
Language Proficiency : US EnglishExperience : 4+ years working with LLMs or in AI / ML data annotation / evaluation (preferred but not mandate)Location : United States-based candidates onlyImmediate Action Required :
Sample Profiles : Request submission of sample candidate profiles for evaluation.Shortlisting Criteria : Candidates will undergo an assessment for shortlisting.Tentative Rates : Please provide estimated hourly rates for each proposed candidate based on experience and expertise.Timelines to Source : Share the expected timeline to source 40 qualified profiles.Candidate’s Profile :
Graduate with 4+ years experience in LLM, Golden AnswersReady for part-time (4 hours daily) engagementReady to work for 4 to 6 weeksCan join immediately