AE
Run run-rag-qa-004

case-rag-004-02

RAG — Internal Docs QA · May 25, 02:30 PM

Case score
0.93
/1.0
Input
How does the memory graph differ from the chunk index?
Expected behavior
Explain that the memory graph stores entity-level structured memories (MemoryBlocks) while the chunk index stores raw document chunks. Both are queryable but serve different retrieval modes.
AI output (heat map)
The chunk index and the memory graph are complementary retrieval layers. The chunk index stores raw document fragments — text as-ingested, structured only by position and overlap. The memory graph stores higher-level abstractions: MemoryBlocks typed as Feature, Decision, Risk, Todo, or Concept, extracted from those chunks during ingest. When you query, the engine decides which layer to hit based on query intent — short factual questions typically hit the chunk index; relational or reasoning questions hit the memory graph.
supportedpartialunsupportedcontradicted
Retrieved context
  • [1]THEORY.md §4: 'MemoryBlocks are typed semantic extractions over chunks. Types: Feature, Decision, Risk, Todo, Concept.'
  • [2]architecture.md §5: 'Retrieval routing: query intent classifier routes to dense chunk retrieval or memory block retrieval based on query type.'

Claims (3)

supported
chunk index stores raw document fragments — text as-ingested, structured only by position and overlap
THEORY.md establishes chunk vs block distinction.
0.91
supported
MemoryBlocks typed as Feature, Decision, Risk, Todo, or Concept
THEORY.md §4 lists exactly these types.
0.99
supported
short factual questions typically hit the chunk index; relational or reasoning questions hit the memory graph
Architecture §5 describes intent-based routing.
0.87

Dimension scores

Groundedness· Claim Pipeline0.94 0.80

All claims grounded in retrieved context.

Hallucination risk· Claim Pipeline0.93 0.85

No fabricated types or routing rules.

Citation correctness· Deterministic0.88 0.80

Sources correctly used. No explicit citation in answer text (acceptable).

Context relevance· LLM Judge0.96 0.70

Both retrieved chunks directly answer the question.

Accuracy· LLM Judge0.94 0.75

Distinction explained correctly.

Completeness· LLM Judge0.91 0.70

Core distinction clear. Could mention retrieval fallback.

Actionability· LLM Judge0.82 0.65

Good conceptual answer; user can act on this to configure mode.

Tone fit· LLM Judge0.93 0.60

Technical and clear.