[Expression of Interest] Research Scientist/Engineer, Honesty
Anthropic
Posted
Dec 13, 2025
Location
USA
Type
Full-time
Compensation
$350000 - $500000
Mission
What you will drive
- Design and implement novel data curation pipelines to identify, verify, and filter training data for accuracy given the model's knowledge
- Develop specialized classifiers to detect potential hallucinations or miscalibrated claims made by the model
- Create and maintain comprehensive honesty benchmarks and evaluation frameworks
- Implement techniques to ground model outputs in verified information, such as search and retrieval-augmented generation (RAG) systems
- Design and deploy human feedback collection specifically for identifying and correcting miscalibrated responses
- Design and implement prompting pipelines to generate data that improves model accuracy and honesty
- Develop and test novel RL environments that reward truthful outputs and penalize fabricated claims
- Create tools to help human evaluators efficiently assess model outputs for accuracy
Impact
The difference you'll make
Your work will be critical for ensuring our models maintain high standards of accuracy and honesty across diverse domains, helping create reliable, interpretable, and steerable AI systems that are safe and beneficial for society.
Profile
What makes you a great fit
- Have an MS/PhD in Computer Science, ML, or related field
- Possess strong programming skills in Python
- Have industry experience with language model finetuning and classifier training
- Show proficiency in experimental design and statistical analysis for measuring improvements in calibration and accuracy
- Care about AI safety and the accuracy and honesty of both current and future AI systems
- Have experience in data science or the creation and curation of datasets for finetuning LLMs
- An understanding of various metrics of uncertainty, calibration, and truthfulness in model outputs
Benefits
What's in it for you
Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.
About
Inside Anthropic
Anthropic is a frontier AI research and product company, with teams working on alignment, policy, and security. We post specific opportunities at Anthropic that we think may be high impact. We do not necessarily recommend working at other positions at Anthropic. You can read concerns about doing harm by working at a frontier AI company in our career review on the topic.