ML/Research Engineer, Safeguards
Anthropic
Location
USA
Type
Full-time
Posted
Jan 15, 2026
Compensation
USD 200000 – 200000
Mission
What you will drive
Core responsibilities:
- Develop classifiers to detect misuse and anomalous behavior at scale, including synthetic data pipelines and methods for sourcing evaluations
- Build systems to monitor for harms spanning multiple exchanges, such as coordinated cyber attacks and influence operations
- Evaluate and improve the safety of agentic products by developing threat models, test environments, and mitigations for prompt injection attacks
- Conduct research on automated red-teaming, adversarial robustness, and other methods to test for or find misuse
Impact
The difference you'll make
This role helps detect and mitigate misuse of AI systems, protecting user wellbeing and ensuring models behave appropriately across contexts, directly supporting Anthropic's Responsible Scaling Policy commitments to create safe and beneficial AI.
Profile
What makes you a great fit
Required qualifications:
- 4+ years of experience in ML engineering, research engineering, or applied research in academia or industry
- Proficiency in Python and experience building ML systems
- Comfort working across the research-to-deployment pipeline, from exploratory experiments to production systems
- Strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
Benefits
What's in it for you
Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space for collaboration. The annual salary range is $350,000–$500,000 USD.
About
Inside Anthropic
Anthropic's mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society, with a team of researchers, engineers, policy experts, and business leaders working together on high-impact AI research.