AI Safety & Governance Contract

Contractor, Technical Benchmarking Lead / Researcher

New York University, Center for Mind, Ethics, and Policy

Posted

Apr 17, 2026

Location

Remote (US)

Type

Contract

Compensation

Up to $312000

Deadline

⏰ Apr 30, 2026

Mission

What you will drive

Core responsibilities:

  • Lead the technical development of LLM evaluation benchmarks and coordinate with key team members
  • Design and develop benchmark architectures, evaluation rubrics, and scoring methodologies independently
  • Apply LLM-as-judge frameworks and system prompt engineering in evaluation pipeline design
  • Conduct statistical analysis of model outputs to validate benchmark effectiveness
  • Contribute to technical research writing and documentation

Impact

The difference you'll make

This role contributes to the development of robust evaluation frameworks for large language models, supporting ethical AI development and policy through technical benchmarking research.

Profile

What makes you a great fit

Required skills and qualifications:

  • Experience with LLM evaluation benchmarks and technical development
  • Ability to design benchmark architectures, evaluation rubrics, and scoring methodologies
  • Knowledge of LLM-as-judge frameworks and system prompt engineering
  • Statistical analysis skills for validating benchmark effectiveness
  • Technical research writing and documentation capabilities

Benefits

What's in it for you

No compensation, perks, or culture highlights mentioned in the posting.

About

Inside New York University, Center for Mind, Ethics, and Policy

Visit site →

New York University's Center for Mind, Ethics, and Policy focuses on research at the intersection of cognitive science, ethics, and policy, particularly regarding emerging technologies like AI.