Contractor, Technical Benchmarking Lead / Researcher
New York University, Center for Mind, Ethics, and Policy
Posted
Apr 17, 2026
Location
Remote (US)
Type
Contract
Compensation
Up to $312000
Deadline
⏰ Apr 30, 2026
Mission
What you will drive
Core responsibilities:
- Lead the technical development of LLM evaluation benchmarks and coordinate with key team members
- Design and develop benchmark architectures, evaluation rubrics, and scoring methodologies independently
- Apply LLM-as-judge frameworks and system prompt engineering in evaluation pipeline design
- Conduct statistical analysis of model outputs to validate benchmark effectiveness
- Contribute to technical research writing and documentation
Impact
The difference you'll make
This role contributes to the development of robust evaluation frameworks for large language models, supporting ethical AI development and policy through technical benchmarking research.
Profile
What makes you a great fit
Required skills and qualifications:
- Experience with LLM evaluation benchmarks and technical development
- Ability to design benchmark architectures, evaluation rubrics, and scoring methodologies
- Knowledge of LLM-as-judge frameworks and system prompt engineering
- Statistical analysis skills for validating benchmark effectiveness
- Technical research writing and documentation capabilities
Benefits
What's in it for you
No compensation, perks, or culture highlights mentioned in the posting.
About
Inside New York University, Center for Mind, Ethics, and Policy
New York University's Center for Mind, Ethics, and Policy focuses on research at the intersection of cognitive science, ethics, and policy, particularly regarding emerging technologies like AI.