AE

Reports

One exportable report per completed run. Template: 13-section evaluation report format from the wiki.

Reports are generated from run data. Click Export .md to download a markdown file for stakeholder distribution.
Customer Support ReplyNeeds work2 safety
run-support-003May 26, 01:00 PMmodel: claude-sonnet-4-6rubric: support-reply-v1.0changed: context-window-reduced
64.0%
32/50 passing
Export .md
Shadow — Daily ReflectionAcceptable
run-shadow-dr-003May 26, 09:14 AMmodel: gpt-4orubric: shadow-daily-reflection-v1.1changed: system-prompt-v3.1
82.0%
34/40 passing
Export .md
RAG — Internal Docs QAShip-ready
run-rag-qa-004May 25, 02:30 PMmodel: claude-sonnet-4-6rubric: rag-qa-v2.0changed: retrieval-topk-8
91.0%
28/30 passing
Export .md
Area Mosa — Booking AssistantShip-ready
run-booking-002May 24, 08:45 AMmodel: gpt-4o-minirubric: booking-assistant-v1.3changed: tone-friendly-v2
88.0%
23/25 passing
Export .md
AI Planning AssistantShip-ready
run-planner-005May 23, 04:20 PMmodel: claude-opus-4-6rubric: planner-v0.4changed: task-decomposition-prompt-v5
94.0%
19/20 passing
Export .md
RAG — Internal Docs QAAcceptable
run-rag-qa-003May 20, 11:15 AMmodel: claude-sonnet-4-6rubric: rag-qa-v2.0changed: retrieval-topk-4
79.0%
24/30 passing
Export .md
Shadow — Daily ReflectionNeeds workregression
run-shadow-dr-002May 19, 10:02 AMmodel: gpt-4orubric: shadow-daily-reflection-v1.1changed: system-prompt-v3.0
74.0%
28/40 passing
Export .md
AI Planning AssistantAcceptable
run-planner-004May 16, 10:00 AMmodel: claude-opus-4-6rubric: planner-v0.4changed: task-decomposition-prompt-v4
77.0%
15/20 passing
Export .md