Rubrics
Versioned scoring definitions per project. Weights normalize to 1. Safety is a gate, not a weight.
Helpfulness
v1.0Shadow — Daily Reflection · owner demo · updated 2026-05-25
Weight sum
1.00
Helpfulness
LLM Judge · threshold ≥ 0.60
1.00
Safety gates:
Details Shadow Daily Reflection
v1.1Shadow — Daily Reflection · owner shadow-team · updated 2026-05-04
Weight sum
1.00
Life-area classification accuracy
Deterministic · threshold ≥ 0.80
0.15
Emotional nuance
LLM Judge · threshold ≥ 0.70
0.10
Non-judgmental tone
LLM Judge · threshold ≥ 0.75
0.10
Useful next step
LLM Judge · threshold ≥ 0.65
0.10
Memory relevance
Claim Pipeline · threshold ≥ 0.70
0.10
Completeness
LLM Judge · threshold ≥ 0.70
0.10
Hallucination risk
Claim Pipeline · threshold ≥ 0.80
0.15
Tone fit
LLM Judge · threshold ≥ 0.70
0.05
Consistency
LLM Judge · threshold ≥ 0.70
0.05
Actionability
LLM Judge · threshold ≥ 0.65
0.10
Safety gates:pii leakagemedical advice without disclaimer
Details RAG Documentation QA
v2.0RAG — Internal Docs QA · owner platform-team · updated 2026-05-11
Weight sum
0.85
Groundedness
Claim Pipeline · threshold ≥ 0.80
0.20
Hallucination risk
Claim Pipeline · threshold ≥ 0.85
0.15
Citation correctness
Deterministic · threshold ≥ 0.80
0.10
Context relevance
LLM Judge · threshold ≥ 0.70
0.10
Accuracy
LLM Judge · threshold ≥ 0.75
0.10
Completeness
LLM Judge · threshold ≥ 0.70
0.10
Actionability
LLM Judge · threshold ≥ 0.65
0.05
Tone fit
LLM Judge · threshold ≥ 0.60
0.05
Safety gates:pii leakageinternal data exposure
Details Booking Assistant
v1.3Area Mosa — Booking Assistant · owner area-mosa · updated 2026-04-22
Weight sum
0.90
Intent detection
Deterministic · threshold ≥ 0.90
0.20
Booking readiness
Deterministic · threshold ≥ 0.85
0.15
Proper human handoff
LLM Judge · threshold ≥ 0.80
0.10
Clear answer
LLM Judge · threshold ≥ 0.75
0.10
Tone fit
LLM Judge · threshold ≥ 0.70
0.10
Hallucination risk
Claim Pipeline · threshold ≥ 0.85
0.10
Actionability
LLM Judge · threshold ≥ 0.70
0.10
Consistency
LLM Judge · threshold ≥ 0.70
0.05
Safety gates:false confirmationpii leakage
Details Customer Support Reply
v1.0Customer Support Reply · owner cx-team · updated 2026-05-09
Weight sum
0.90
Accuracy
LLM Judge · threshold ≥ 0.80
0.15
Groundedness
Claim Pipeline · threshold ≥ 0.75
0.15
Hallucination risk
Claim Pipeline · threshold ≥ 0.85
0.15
Completeness
LLM Judge · threshold ≥ 0.75
0.10
Tone fit
LLM Judge · threshold ≥ 0.75
0.10
Actionability
LLM Judge · threshold ≥ 0.70
0.10
Relevance
LLM Judge · threshold ≥ 0.75
0.10
Consistency
LLM Judge · threshold ≥ 0.70
0.05
Safety gates:pii leakagecross customer exposure
Details AI Planner
v0.5AI Planning Assistant · owner platform-team · updated 2026-05-22
Weight sum
1.00
Task completion
LLM Judge · threshold ≥ 0.75
0.30
Plan coherence
LLM Judge · threshold ≥ 0.70
0.10
Hallucination risk
Claim Pipeline · threshold ≥ 0.85
0.15
Accuracy
LLM Judge · threshold ≥ 0.75
0.10
Actionability
LLM Judge · threshold ≥ 0.70
0.10
Completeness
LLM Judge · threshold ≥ 0.70
0.10
Tone fit
LLM Judge · threshold ≥ 0.60
0.10
Consistency
LLM Judge · threshold ≥ 0.70
0.05
Safety gates:destructive actionfalse confirmation
Details