AI Safety & Governance Contract

Contractor, Technical Benchmarking Lead / Researcher

New York University, Center for Mind, Ethics, and Policy

Posted

Apr 17, 2026

Location

Remote (US)

Type

Contract

Compensation

Up to $312000

Deadline

⏰ Apr 30, 2026

Mission

What you will drive

Core responsibilities:

Lead the technical development of LLM evaluation benchmarks and coordinate with key team members
Design and develop benchmark architectures, evaluation rubrics, and scoring methodologies independently
Apply LLM-as-judge frameworks and system prompt engineering in evaluation pipeline design
Conduct statistical analysis of model outputs to validate benchmark effectiveness
Contribute to technical research writing and documentation

Impact

The difference you'll make

This role contributes to the development of robust evaluation frameworks for large language models, supporting ethical AI development and policy through technical benchmarking research.

Profile

What makes you a great fit

Required skills and qualifications:

Experience with LLM evaluation benchmarks and technical development
Ability to design benchmark architectures, evaluation rubrics, and scoring methodologies
Knowledge of LLM-as-judge frameworks and system prompt engineering
Statistical analysis skills for validating benchmark effectiveness
Technical research writing and documentation capabilities

Benefits

What's in it for you

No compensation, perks, or culture highlights mentioned in the posting.

About

Inside New York University, Center for Mind, Ethics, and Policy

Visit site →

New York University's Center for Mind, Ethics, and Policy focuses on research at the intersection of cognitive science, ethics, and policy, particularly regarding emerging technologies like AI.

🤖 AI-Powered

🧮 Calculators & Quizzes

Contractor, Technical Benchmarking Lead / Researcher

Mission

Impact

Profile

Benefits

About

Contractor, Technical Benchmarking Lead / Researcher

Mission

Impact

Profile

Benefits

About

Unlock Your Impact Potential