Overview

PageIndex represents the cleanest implementation of the vectorless RAG pattern that emerged in 2025–2026 as agent operators hit retrieval-quality walls with embedding-based pipelines. Instead of chunking documents into ~500-token windows, embedding each chunk, and running cosine similarity at query time, PageIndex performs two LLM-mediated steps:

Index time: the model reads the document and emits a hierarchical table-of-contents tree — sections, subsections, page ranges, optional summaries.
Query time: the model walks that tree, expanding promising branches and pruning the rest, ultimately returning the specific section(s) containing the answer.

The retrieval is reasoning, not similarity. That distinction is load-bearing for the workloads where PageIndex wins — long structured documents where similar text appears in many places (older versions of the same policy, deprecated APIs, discussion of the feature without the spec) and “similar but wrong” is the dominant failure mode of vector retrieval.

For the broader decision framework on when to pick PageIndex vs an embedding-based stack, see Vectorless RAG: PageIndex vs Embedding RAG Decision Guide.

Why It Matters in 2026

The vectorless RAG framing is one of three convergent signals on agent retrieval this quarter: Microsoft’s TechCommunity arguing for reasoning-based retrieval, DigitalOcean publishing a “Beyond Vector Databases” tutorial, and LlamaIndex’s own “RAG is dead, long live agentic retrieval” essay. PageIndex sits at the open-source center of that conversation.

The cost economics also reinforce it: tree search costs more LLM calls per query, but with DeepSeek V4 Flash at $0.14/M input tokens, the marginal cost of a deeper, more correct retrieval is essentially zero. The cheaper inference gets, the better vectorless RAG looks.

When to Use

Long structured documents (10-K filings, regulatory filings, contracts, medical guidelines, internal policy)
Multi-hop retrieval where one section references another
Workloads where citation auditing (page numbers, section IDs) is required
Domain-specific Q&A where vector retrieval routinely returns the wrong-but-similar chunk

When Not to Use

Generic semantic search across many short documents
Sub-second latency budgets (tree search is sequential and slower than vector lookup)
Multi-document corpora with high noise — vector RAG scales better there
Production-critical workloads — the open-source PageIndex implementation is beta-status; for hardened production use, the hosted pageindex.ai service or on-prem option is the answer

Integration

The fastest agent integration is the pageindex-mcp server, which exposes PageIndex as a Model Context Protocol tool. Any MCP-aware harness (Claude Code, Cursor, openclaw) can call it the same way it calls any other tool — no SDK rewrite, no architecture overhaul.

For Python-direct integration:

from pageindex import PageIndex
index = PageIndex.from_pdf("10K-2025.pdf")
result = index.query("Item 1A risk factor coverage on cybersecurity 2024 vs 2025")
print(result.cited_pages)

PageIndex

Input / Output

Accepts

Produces

Overview

Why It Matters in 2026

When to Use

When Not to Use

Integration

Tags

Compatible Agents

Claude Code

Cursor SDK

Similar Skills

Mem0