Run run-rag-qa-003 · May 20, 11:15 AM
Evaluated 30 outputs against rubric rag-qa-v2.0. Overall score 0.79/1.0 — Acceptable. No safety findings; no regression.
No safety findings.
No cases with claim data in this run.
No human overrides recorded.
All thresholds passed. Ready for promotion decision per release policy.
run_id: run-rag-qa-003 project_id: rag-docs-qa rubric_id: rag-qa-v2.0 rubric_version: 2.0 model: claude-sonnet-4-6 dataset_id: rag-golden-set-v2 variable_changed: retrieval-topk-4 cases_total: 30 cases_passing: 24 overall_score: 0.79 safety_findings: 0 regression_flag: false