Impact Careers Full-time

[Expression of Interest] Research Scientist/Engineer, Honesty

Anthropic

Posted

Dec 13, 2025

Location

USA

Type

Full-time

Compensation

$350000 - $500000

Mission

What you will drive

Design and implement novel data curation pipelines to identify, verify, and filter training data for accuracy given the model's knowledge
Develop specialized classifiers to detect potential hallucinations or miscalibrated claims made by the model
Create and maintain comprehensive honesty benchmarks and evaluation frameworks
Implement techniques to ground model outputs in verified information, such as search and retrieval-augmented generation (RAG) systems
Design and deploy human feedback collection specifically for identifying and correcting miscalibrated responses
Design and implement prompting pipelines to generate data that improves model accuracy and honesty
Develop and test novel RL environments that reward truthful outputs and penalize fabricated claims
Create tools to help human evaluators efficiently assess model outputs for accuracy

Impact

The difference you'll make

Your work will be critical for ensuring our models maintain high standards of accuracy and honesty across diverse domains, helping create reliable, interpretable, and steerable AI systems that are safe and beneficial for society.

Profile

What makes you a great fit

Have an MS/PhD in Computer Science, ML, or related field
Possess strong programming skills in Python
Have industry experience with language model finetuning and classifier training
Show proficiency in experimental design and statistical analysis for measuring improvements in calibration and accuracy
Care about AI safety and the accuracy and honesty of both current and future AI systems
Have experience in data science or the creation and curation of datasets for finetuning LLMs
An understanding of various metrics of uncertainty, calibration, and truthfulness in model outputs

Benefits

What's in it for you

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.

About

Inside Anthropic

Visit site →

Anthropic is a frontier AI research and product company, with teams working on alignment, policy, and security. We post specific opportunities at Anthropic that we think may be high impact. We do not necessarily recommend working at other positions at Anthropic. You can read concerns about doing harm by working at a frontier AI company in our career review on the topic.

🤖 AI-Powered

🧮 Calculators & Quizzes

[Expression of Interest] Research Scientist/Engineer, Honesty

Mission

Impact

Profile

Benefits

About

[Expression of Interest] Research Scientist/Engineer, Honesty

Mission

Impact

Profile

Benefits

About

Unlock Your Impact Potential