Application Guide

How to Apply for Contractor, Technical Benchmarking Lead / Researcher

at New York University, Center for Mind, Ethics, and Policy

๐Ÿข About New York University, Center for Mind, Ethics, and Policy

The Center for Mind, Ethics, and Policy at NYU is a unique academic hub exploring nonhuman consciousness, sentience, and moral status, with a focus on animals and AI. Working here means contributing to cutting-edge research that directly informs policy and ethical frameworks for emerging technologies.

About This Role

As the Technical Benchmarking Lead, you will design and develop LLM evaluation benchmarks to assess AI systems for consciousness-related capabilities. Your work will directly impact how the research community and policymakers understand AI sentience, making this role both technically challenging and socially meaningful.

๐Ÿ’ก A Day in the Life

You'll start by reviewing recent model outputs and analyzing benchmark metrics, then meet with the research team to discuss findings and iterate on evaluation rubrics. Afternoons might involve coding new benchmark tasks, running experiments, or drafting technical documentation. You'll also spend time reading relevant papers and coordinating with collaborators on next steps.

๐ŸŽฏ Who New York University, Center for Mind, Ethics, and Policy Is Looking For

  • Has hands-on experience building LLM evaluation benchmarks (e.g., BIG-bench, HELM, or custom frameworks) and can demonstrate a portfolio of past benchmark designs.
  • Deeply understands LLM-as-judge paradigms and system prompt engineering, with examples of optimizing prompts for consistent scoring.
  • Strong statistical analysis skills, including hypothesis testing and effect size calculations, to validate benchmark reliability and validity.
  • Can independently architect evaluation rubrics and scoring methodologies, balancing rigor with practical constraints like cost and time.

๐Ÿ“ Tips for Applying to New York University, Center for Mind, Ethics, and Policy

1

Tailor your resume to highlight specific benchmark projects you've led, including the evaluation metrics and validation methods used.

2

In your cover letter, explicitly connect your technical skills to the Center's mission of assessing nonhuman minds (animals and AI).

3

Prepare a brief 1-page technical summary of a benchmark you've designed, including architecture, rubric, and statistical validation results.

4

Mention any experience with open-source benchmarking tools (e.g., EleutherAI's lm-eval-harness) and how you've contributed to or modified them.

5

Since the role is remote and contract-based, emphasize your ability to work independently and communicate asynchronously with a distributed team.

โœ‰๏ธ What to Emphasize in Your Cover Letter

['Your technical expertise in designing LLM evaluation benchmarks and scoring methodologies.', "Your alignment with the Center's mission to study nonhuman consciousness and your interest in AI sentience.", "Examples of statistical validation you've performed on evaluation metrics (e.g., inter-rater reliability, correlation with human judgments).", 'Your ability to lead technical development independently while collaborating with interdisciplinary researchers.']

Generate Cover Letter โ†’

๐Ÿ” Research Before Applying

To stand out, make sure you've researched:

  • โ†’ Read recent publications from the Center, especially those on AI consciousness and evaluation frameworks (e.g., work by David Chalmers or Jeff Sebo).
  • โ†’ Familiarize yourself with existing benchmarks for AI sentience or moral reasoning (e.g., the Moral Machine, or benchmarks for theory of mind).
  • โ†’ Understand the Center's interdisciplinary approach: how do they combine philosophy, cognitive science, and computer science?
  • โ†’ Review the job description's focus on 'nonhuman minds'โ€”be ready to discuss how your benchmark work could apply to animal cognition as well.
Visit New York University, Center for Mind, Ethics, and Policy's Website โ†’

๐Ÿ’ฌ Prepare for These Interview Topics

Based on this role, you may be asked about:

1 Describe a benchmark you designed from scratch. What were the key design decisions and trade-offs?
2 How would you design an evaluation rubric for a novel AI capability, like theory of mind or moral reasoning?
3 Explain how you would validate a benchmark's effectiveness using statistical methods (e.g., power analysis, effect size).
4 How do you handle prompt sensitivity in LLM-as-judge frameworks? Give an example of optimizing a judge prompt.
5 What experience do you have with technical research writing? Provide an example of a documentation or paper you've contributed to.
Practice Interview Questions โ†’

โš ๏ธ Common Mistakes to Avoid

  • Submitting a generic application without mentioning the Center's specific focus on nonhuman consciousness.
  • Overemphasizing general LLM experience without concrete examples of benchmark design or statistical validation.
  • Ignoring the contract nature of the roleโ€”ensure you're clear on availability and willingness to work remotely with flexible hours.

๐Ÿ“… Application Timeline

โฐ Deadline: April 30, 2026

We recommend applying at least a few days early to avoid last-minute technical issues.

Typical hiring timeline:

1

Application Review

1-2 weeks

2

Initial Screening

Phone call or written assessment

3

Interviews

1-2 rounds, usually virtual

โœ“

Offer

Congratulations!

Ready to Apply?

Good luck with your application to New York University, Center for Mind, Ethics, and Policy!