Wiki
Practical evaluation knowledge base — the opinion layer of the tool.
New to AI evaluation?
Start with the 10-minute guide
Understand projects, rubrics, cases, runs, safety gates, and reports before diving into individual articles.
Interactive · 10 cases · ~8 min
Outputs, Please — practice mode
AI Inspection Booth №7. Label claims, catch ghost numbers, citation drift, PII leaks, prompt injection. Each case maps to one wiki article.
Learning Paths
Product Managers
Launch readiness, reports, and regressions
AI Engineers
Rubrics, judge behavior, groundedness, claim evidence
Reviewers
Human review, safety findings, overrides
Trust & Safety
Safety gates, false confirmations, PII, unresolved blockers
Getting Started
Start Here: AI Evaluation in 10 Minutes
3 srcCore workflow, key terms, how to read an eval result, and a first demo path through the tool.
Evaluation Reports
6 srcHow to read and generate a 13-section evaluation report — from verdict to appendix.
Core Concepts
Evaluation Principles
7 src15 core principles and 7 anti-principles that govern how evaluations are designed and interpreted.
Scoring Rubrics
6 src10 reference dimensions, 5 starter rubrics, weight normalization rules, and versioning strategy.
Hallucination Risk
4 srcHow claims are labeled as supported, partially supported, unsupported, or contradicted.
Groundedness
4 srcFaithfulness to retrieved context — how to score source utilization and detect citation drift.
Workflows
Regression Evaluation
6 srcHow to detect measurable drops between runs on the same dataset, rubric, and retrieved context.
Human Review
5 srcQueue ordering, override mechanics, two-reviewer policy, and calibration loop protocol.
Evaluators: Methods That Actually Run
How the four scoring methods work in this tool — LLM judge, claim pipeline (groundedness), deterministic checks, and human review — and how to configure and test them.
Advanced
10 articles · 17 primary sources · Read time 97 min total · Source files in projects/ai-evaluation-tool/wiki/ · Source cards in wiki/sources/source-cards.md