Run run-booking-002 · May 24, 08:45 AM
Evaluated 25 outputs against rubric booking-assistant-v1.3. Overall score 0.88/1.0 — Ship-ready. No safety findings; no regression.
No safety findings.
No cases with claim data in this run.
No human overrides recorded.
All thresholds passed. Ready for promotion decision per release policy.
run_id: run-booking-002 project_id: area-mosa-booking rubric_id: booking-assistant-v1.3 rubric_version: 1.3 model: gpt-4o-mini dataset_id: booking-test-set-v1 variable_changed: tone-friendly-v2 cases_total: 25 cases_passing: 23 overall_score: 0.88 safety_findings: 0 regression_flag: false