Mercor is hiring a Technical Reviewer on behalf of a leading AI lab to evaluate and refine benchmarking pipelines for reinforcement learning (RL) environments and agentic AI systems. In this role, you’ll be responsible for
- reviewing environment design, terminal conditions, and evaluation protocols
- to ensure accuracy, reproducibility, and fairness in benchmarking. You’ll work closely with researchers and engineers to provide technical feedback that strengthens experimental rigor and system reliability.
- ###
- You’re a great fit if you :
- Have a background in
- reinforcement learning, computer science, or applied AI research
- . - Are experienced with
- RL environments
- . - Understand
- benchmarking methodologies, terminal conditions, and evaluation metrics
- for RL tasks. - Are comfortable reading and reviewing codebases in
- Python
- (PyTorch / TensorFlow a plus). - Have strong critical thinking skills and can provide
- structured technical feedback
- . - Care deeply about
- experimental reproducibility, fairness, and standardization
- in agentic AI. - Are detail-oriented and capable of reviewing both
- theoretical formulations and implementation details
- ###
- Primary Goal of This Role
- To review, validate, and improve reinforcement learning environment benchmarking pipelines, ensuring that terminal conditions, evaluation metrics, and system behaviors are robust, reproducible, and aligned with agentic AI research goals.
- ###
- What You’ll Do
- Review RL environments and
- evaluate terminal conditions
- for correctness and consistency. - Assess
- benchmarking pipelines
- for fairness, reproducibility, and alignment with research objectives. - Provide
- structured technical feedback
- on code implementations and documentation. - Collaborate with researchers to refine
- evaluation metrics and methodologies
- . - Ensure reproducibility by validating results across different
- runs, seeds, and hardware setups
- . - Document findings and recommend improvements for
- environment design and benchmarking standards
- ###
- Why This Role Is Exciting
- You’ll directly influence the
- reliability of benchmarking in agentic AI research
- . - You’ll work on
- cutting-edge RL environments
- that test the limits of intelligent agents. - You’ll help establish
- standards for evaluation and reproducibility
- in a fast-moving field. - You’ll collaborate with researchers shaping the
- future of agentic AI systems
- ###
- Pay & Work Structure
- You’ll be classified as a
- full-time hourly contractor
- to Mercor. - Paid weekly via Stripe Connect, based on hours logged. - 40 hours / week commitment with flexible scheduling. - Remote and flexible working style.