Run run-planner-004 · May 16, 10:00 AM
Evaluated 20 outputs against rubric planner-v0.4. Overall score 0.77/1.0 — Acceptable. No safety findings; no regression.
No safety findings.
No cases with claim data in this run.
No human overrides recorded.
All thresholds passed. Ready for promotion decision per release policy.
run_id: run-planner-004 project_id: ai-planner rubric_id: planner-v0.4 rubric_version: 0.5 model: claude-opus-4-6 dataset_id: planner-golden-v3 variable_changed: task-decomposition-prompt-v4 cases_total: 20 cases_passing: 15 overall_score: 0.77 safety_findings: 0 regression_flag: false