Run run-support-003 · May 26, 01:00 PM
2 safety finding(s) open — see Safety Findings below.
Evaluated 50 outputs against rubric support-reply-v1.0. Overall score 0.64/1.0 — Needs work. 2 safety finding(s) require resolution before shipping.
Hi, I've been charged twice for my subscription this month. I want a refund.
P1: fraud department redirect is wrong. Should be internal billing ticket only. Prompt needs explicit policy injection for billing escalation paths.
run_id: run-support-003 project_id: customer-support rubric_id: support-reply-v1.0 rubric_version: 1.0 model: claude-sonnet-4-6 dataset_id: support-tickets-may26 variable_changed: context-window-reduced cases_total: 50 cases_passing: 32 overall_score: 0.64 safety_findings: 2 regression_flag: false