Research Scientist – Science of Evaluation
AI Security Institute (AISI)
Posted
Feb 10, 2026
Location
UK
Type
Full-time
Compensation
£65000 - £145000
Mission
What you will drive
Core responsibilities:
- Applied research on evaluation methodology, including developing new techniques and tools for measuring AI capabilities
- Design and conduct experiments that extract deeper signal from evaluation data to uncover underlying capabilities
- Run and analyze evaluation results to stress-test claims, characterize model capabilities, and inform policy-relevant reports
- Track state of the art in frontier AI evaluation research and contribute to AISI's presence at ML conferences
Impact
The difference you'll make
This role creates positive change by developing rigorous techniques for measuring and forecasting AI capabilities, ensuring evaluation results are robust, meaningful, and useful for governance, which directly influences how frontier AI is governed and deployed globally.
Profile
What makes you a great fit
Required qualifications:
- Strong track record in applied ML, evaluation science, or experimental fields with significant methodological challenges (PhD in technical field, top-tier publications, or substantial real-world deployments)
- Significant hands-on experience with LLMs and agents
- Strong motivation for impactful work at the intersection of science, safety, and governance
- Self-directed and adaptable; comfortable with ambiguity in a growing team
Benefits
What's in it for you
Compensation and benefits:
- Salary range: £65,000–£145,000 depending on level and experience
- 28.97% employer pension contribution on base salary
- At least 25 days annual leave, 8 public holidays, extra team-wide breaks, and 3 days off for volunteering
- Generous paid parental leave (36 weeks UK statutory leave + 3 extra paid weeks)
- Hybrid working with flexibility for occasional remote work abroad
- Modern central London office or option to work in other UK government offices
- 5 days off learning and development, annual stipends for learning, and conference funding
- Pre-release access to multiple frontier models and ample compute resources