AI Safety & Governance Full-time

LLM Evaluation Engineer

Thirdlaw

Location

Remote, Global

Type

Full-time

Posted

Nov 05, 2025

Mission

What you will drive

  • Build an evaluation layer that determines whether LLM interactions comply with enterprise policies in real-time
  • Design evaluation logic using semantic similarity, foundation model scoring, and rule-based systems to enforce AI safety
  • Implement real-time guardrails and classifiers that connect with downstream enforcement actions like redaction or blocking
  • Prototype and tune language models and prompt templates for classification and scoring purposes
  • Build tools to observe, debug, and improve evaluator performance across diverse real-world data distributions

Impact

The difference you'll make

This role creates positive change by developing systems that ensure AI safety and compliance with enterprise policies, helping to prevent harmful or inappropriate AI interactions in real-world applications.

Profile

What makes you a great fit

  • Experience with building evaluation systems for large language models
  • Proficiency in semantic similarity techniques, foundation model scoring, and rule-based systems
  • Ability to implement real-time guardrails and classifiers
  • Experience with prototyping and tuning language models and prompt templates
  • Skills in building tools for observing, debugging, and improving evaluator performance

Benefits

What's in it for you

No benefits information provided in the job description.

About

Inside Thirdlaw

Visit site →

No organization information provided in the job description.