We are looking for a senior-level Data Scientist to drive experimentation, evaluation, and AI / LLM-powered product improvements. In this role, you will act as a strategic partner to product, engineering, and trust & safety teams, responsible for defining evaluation frameworks, leading experiments (A / B tests, quasi-experiments, etc.), and translating both offline and live model performance into actionable product enhancements.
The ideal candidate will have a strong track record in startup-style experimentation—moving quickly and efficiently with rigorous methods—as well as experience conducting product experimentation at scale. Proven expertise in leading and managing teams to deliver high-impact data science outcomes is highly desirable.
Essential Job Functions
- Lead end-to-end experimentation : hypothesis generation, metric design, experiment design (A / B, multivariate, sequential, etc.), analysis, and interpretation
- Build and maintain evaluation frameworks for LLMs : correctness, consistency, safety, hallucination detection, bias / fairness, etc
- Develop predictive models, classification / ranking systems, and heuristics to improve product features related to AI / language generation
- Collaborate with prompt engineers & model builders to test prompt strategies, fine-tuning, or model selection; work on failure modes / error analysis
- Automate experiment pipelines : dashboards, monitoring, alerting, instrumentation. Ensure data quality & measurement integrity
- Use causal inference / observational studies when randomized experiments are not feasible
- Present findings and recommendations to both technical and non-technical leadership; influence roadmap decisions
- Drive experimentation in startup-like environments : rapid iteration, learning from limited data, and balancing speed with rigor
- Shape large-scale product experimentation : define frameworks for experimentation at scale and integrate results into product strategy
- Lead and mentor teams of data scientists, analysts, and engineers; set best practices for experiment design and AI product evaluation
Requirements
: 8-12+ years of experience in data science / ML roles, ideally with experiment design / product analyticsProven track record in both startup-style and large-scale product experimentationExperience leading teams, setting strategy, and driving execution in cross-functional environmentsStrong background with statistical methods, causal inference, and rigorous measurementExperience using LLMs / NLP / AI / prompt engineering or closely related fieldExcellent coding skills in Python (or similar), strong SQL; experience building and deploying models or analytic pipelinesAbility to work in cross-functional teams, translate technical results into business or product changesStrong communication skills; ability to explain complex analyses to non-technical stakeholdersNice to have :
Experience fine-tuning or working with multiple LLM providers / APIsExperience with experiment platforms or building internal tooling for experimentation & model evaluationExperience in voice / ASR or other multi-modal dataWorking Terms :
Candidates must be flexible and work during US hours at least until 6 p.m. ET in the USA, which is essential for this role & must also have their own system / work setup for remote workSkills Required
Data Science, causal inference , Ai, Sql, Python