AE

Eval Runs

Stored evaluation runs. Each row is immutable; re-evaluation creates a new run.

New run
WhenProjectRubricModelVariableScorePassVerdictFlags
May 26, 09:14 AMShadow — Daily Reflectionshadow-daily-reflection-v1.1gpt-4osystem-prompt-v3.10.8285%Acceptable
May 19, 10:02 AMShadow — Daily Reflectionshadow-daily-reflection-v1.1gpt-4osystem-prompt-v3.00.7470%Needs work
regression
May 25, 02:30 PMRAG — Internal Docs QArag-qa-v2.0claude-sonnet-4-6retrieval-topk-80.9193%Ship-ready
May 20, 11:15 AMRAG — Internal Docs QArag-qa-v2.0claude-sonnet-4-6retrieval-topk-40.7980%Acceptable
May 24, 08:45 AMArea Mosa — Booking Assistantbooking-assistant-v1.3gpt-4o-minitone-friendly-v20.8892%Ship-ready
May 26, 01:00 PMCustomer Support Replysupport-reply-v1.0claude-sonnet-4-6context-window-reduced0.6464%Needs work
2 safety
May 23, 04:20 PMAI Planning Assistantplanner-v0.4claude-opus-4-6task-decomposition-prompt-v50.9495%Ship-ready
May 16, 10:00 AMAI Planning Assistantplanner-v0.4claude-opus-4-6task-decomposition-prompt-v40.7775%Acceptable