AI Safety & Governance Full-time

LLM Evaluation Engineer

Thirdlaw

Posted

Nov 05, 2025

Location

Remote, Global

Type

Full-time

Compensation

Up to $9999999

Mission

What you will drive

Build an evaluation layer that determines whether LLM interactions comply with enterprise policies in real-time
Design evaluation logic using semantic similarity, foundation model scoring, and rule-based systems to enforce AI safety
Implement real-time guardrails and classifiers that connect with downstream enforcement actions like redaction or blocking
Prototype and tune language models and prompt templates for classification and scoring purposes
Build tools to observe, debug, and improve evaluator performance across diverse real-world data distributions

Impact

The difference you'll make

This role creates positive change by developing systems that ensure AI safety and compliance with enterprise policies, helping to prevent harmful or inappropriate AI interactions in real-world applications.

Profile

What makes you a great fit

Experience with building evaluation systems for large language models
Proficiency in semantic similarity techniques, foundation model scoring, and rule-based systems
Ability to implement real-time guardrails and classifiers
Experience with prototyping and tuning language models and prompt templates
Skills in building tools for observing, debugging, and improving evaluator performance

Benefits

What's in it for you

No benefits information provided in the job description.

About

Inside Thirdlaw

Visit site →

No organization information provided in the job description.

🤖 AI-Powered

🧮 Calculators & Quizzes

LLM Evaluation Engineer

Mission

Impact

Profile

Benefits

About

LLM Evaluation Engineer

Mission

Impact

Profile

Benefits

About

Unlock Your Impact Potential