Software Engineer, Safeguards
Anthropic
Posted
Dec 13, 2025
Location
USA
Type
Full-time
Compensation
$320000 - $425000
Mission
What you will drive
Core responsibilities:
- Develop monitoring systems to detect unwanted behaviors from API partners and potentially take automated enforcement actions; surface these in internal dashboards to analysts for manual review
- Build abuse detection mechanisms and infrastructure
- Surface abuse patterns to research teams to harden models at the training stage
- Build robust and reliable multi-layered defenses for real-time improvement of safety mechanisms that work at scale
Impact
The difference you'll make
This role creates positive change by building safety and oversight mechanisms for AI systems to prevent misuse, ensure user well-being, and uphold principles of safety, transparency, and oversight while enforcing terms of service and acceptable use policies.
Profile
What makes you a great fit
Required qualifications:
- Bachelor's degree in Computer Science, Software Engineering or comparable experience
- 5-10+ years of experience in a software engineering position, preferably with a focus on integrity, spam, fraud, or abuse detection and mitigation
- Proficiency in Python and Typescript
- Ability to work across the stack
- Strong communication skills and ability to explain complex technical concepts to non-technical stakeholders
Benefits
What's in it for you
Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and a lovely office space in which to collaborate with colleagues.
About
Inside Anthropic
Anthropic is a frontier AI research and product company, with teams working on alignment, policy, and security. We post specific opportunities at Anthropic that we think may be high impact. We do not necessarily recommend working at other positions at Anthropic. You can read concerns about doing harm by working at a frontier AI company in our career review on the topic.