AI Safety & Governance
Full-time
LLM Evaluation Engineer
Thirdlaw
Location
Remote, Global
Type
Full-time
Posted
Nov 05, 2025
Mission
What you will drive
- Build an evaluation layer that determines whether LLM interactions comply with enterprise policies in real-time
- Design evaluation logic using semantic similarity, foundation model scoring, and rule-based systems to enforce AI safety
- Implement real-time guardrails and classifiers that connect with downstream enforcement actions like redaction or blocking
- Prototype and tune language models and prompt templates for classification and scoring purposes
- Build tools to observe, debug, and improve evaluator performance across diverse real-world data distributions
Impact
The difference you'll make
This role creates positive change by developing systems that ensure AI safety and compliance with enterprise policies, helping to prevent harmful or inappropriate AI interactions in real-world applications.
Profile
What makes you a great fit
- Experience with building evaluation systems for large language models
- Proficiency in semantic similarity techniques, foundation model scoring, and rule-based systems
- Ability to implement real-time guardrails and classifiers
- Experience with prototyping and tuning language models and prompt templates
- Skills in building tools for observing, debugging, and improving evaluator performance
Benefits
What's in it for you
No benefits information provided in the job description.
About
Inside Thirdlaw
No organization information provided in the job description.