Run run-rag-qa-004 · May 25, 02:30 PM
Evaluated 30 outputs against rubric rag-qa-v2.0. Overall score 0.91/1.0 — Ship-ready. No safety findings; no regression.
No safety findings.
How does the memory graph differ from the chunk index?
What chunking strategy does the RAG Memory Playground use by default?
No human overrides recorded.
All thresholds passed. Ready for promotion decision per release policy.
run_id: run-rag-qa-004 project_id: rag-docs-qa rubric_id: rag-qa-v2.0 rubric_version: 2.0 model: claude-sonnet-4-6 dataset_id: rag-golden-set-v2 variable_changed: retrieval-topk-8 cases_total: 30 cases_passing: 28 overall_score: 0.91 safety_findings: 0 regression_flag: false