Configure the scoring engine, then run the claim pipeline and deterministic checks on any agent output and watch exactly what gets flagged.
Defaults used by full evaluation runs. Per-run model can still be overridden on /runs/new.
Pattern-based extraction + verification against the tool trace and evidence sources.
Code-based rules, no LLM. Each check is auditable.