Request for Proposals, AI Interpretability (2026)
Schmidt Sciences
Posted
Mar 18, 2026
Location
Remote
Type
Full-time
Compensation
Up to $9999999
Deadline
⏰ May 26, 2026
Mission
What you will drive
- Receive funding to develop interpretability methods that detect deceptive behaviours in LLMs and steer their reasoning to eliminate these behaviours.
- Develop tools for detecting deceptive behaviours where model outputs contradict internal representations.
- Create steering methods for intervening on model truthfulness using mechanistic understanding.
- Apply detection and steering techniques to real-world use cases and human-AI teams.
- Evaluate methods on realistic scenarios beyond academic benchmarks to prove generalisation.
Profile
What makes you a great fit
- Receive funding to develop interpretability methods that detect deceptive behaviours in LLMs and steer their reasoning to eliminate these behaviours.
- Develop tools for detecting deceptive behaviours where model outputs contradict internal representations.
- Create steering methods for intervening on model truthfulness using mechanistic understanding.
- Apply detection and steering techniques to real-world use cases and human-AI teams.
- Evaluate methods on realistic scenarios beyond academic benchmarks to prove generalisation.
About
Inside Schmidt Sciences
Schmidt Sciences is a philanthropic organisation dedicated to fostering the advancement of science and technology. Its focus areas include AI and Advanced Computing, Astrophysics and Space, Biosciences, Climate, and Science Systems programs.