AI Safety & Governance Full-time

ML/Research Engineer, Safeguards

Anthropic

Location

USA

Type

Full-time

Posted

Jan 15, 2026

Compensation

USD 200000 – 200000

Mission

What you will drive

Core responsibilities:

Develop classifiers to detect misuse and anomalous behavior at scale, including synthetic data pipelines and methods for sourcing evaluations
Build systems to monitor for harms spanning multiple exchanges, such as coordinated cyber attacks and influence operations
Evaluate and improve the safety of agentic products by developing threat models, test environments, and mitigations for prompt injection attacks
Conduct research on automated red-teaming, adversarial robustness, and other methods to test for or find misuse

Impact

The difference you'll make

This role helps detect and mitigate misuse of AI systems, protecting user wellbeing and ensuring models behave appropriately across contexts, directly supporting Anthropic's Responsible Scaling Policy commitments to create safe and beneficial AI.

Profile

What makes you a great fit

Required qualifications:

4+ years of experience in ML engineering, research engineering, or applied research in academia or industry
Proficiency in Python and experience building ML systems
Comfort working across the research-to-deployment pipeline, from exploratory experiments to production systems
Strong communication skills and ability to explain complex technical concepts to non-technical stakeholders

Benefits

What's in it for you

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space for collaboration. The annual salary range is $350,000–$500,000 USD.

About

Inside Anthropic

Visit site →

Anthropic's mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society, with a team of researchers, engineers, policy experts, and business leaders working together on high-impact AI research.