AI Safety & Governance Full-time

Request for Proposals, AI Interpretability (2026)

Schmidt Sciences

Posted

Mar 18, 2026

Location

Remote

Type

Full-time

Compensation

Up to $9999999

Deadline

⏰ May 26, 2026

Mission

What you will drive

  • Receive funding to develop interpretability methods that detect deceptive behaviours in LLMs and steer their reasoning to eliminate these behaviours.
  • Develop tools for detecting deceptive behaviours where model outputs contradict internal representations.
  • Create steering methods for intervening on model truthfulness using mechanistic understanding.
  • Apply detection and steering techniques to real-world use cases and human-AI teams.
  • Evaluate methods on realistic scenarios beyond academic benchmarks to prove generalisation.

Profile

What makes you a great fit

  • Receive funding to develop interpretability methods that detect deceptive behaviours in LLMs and steer their reasoning to eliminate these behaviours.
  • Develop tools for detecting deceptive behaviours where model outputs contradict internal representations.
  • Create steering methods for intervening on model truthfulness using mechanistic understanding.
  • Apply detection and steering techniques to real-world use cases and human-AI teams.
  • Evaluate methods on realistic scenarios beyond academic benchmarks to prove generalisation.

About

Inside Schmidt Sciences

Visit site →

Schmidt Sciences is a philanthropic organisation dedicated to fostering the advancement of science and technology. Its focus areas include AI and Advanced Computing, Astrophysics and Space, Biosciences, Climate, and Science Systems programs.