About RAG-Anything

RAG-Anything is the Data Intelligence Lab at Hong Kong University's (HKUDS) unified framework for retrieval-augmented generation across all content modalities. Where most RAG systems handle text-only or require separate pipelines for different content types, RAG-Anything processes PDFs, Office documents, images, tables, and mathematical formulas through a single integrated system. Three-stage architecture: multimodal parsing (text extraction, image analysis, formula recognition, table parsing), cross-modal knowledge graph construction (entity and relationship extraction across modalities), and hybrid retrieval (graph + vector search combined). 16K stars and 1.9K forks as of April 2026. Integration with LightRAG Server is in progress.

Key Features

Multimodal document parsing: text, images, tables, mathematical formulas

Cross-modal knowledge graph — entities and relationships across content types

Hybrid retrieval: graph search + vector search combined

Processes PDFs, Office files, images in one unified pipeline

16K GitHub stars, 1.9K forks — strong community traction

LightRAG Server integration in progress

EMNLP 2025 research lineage (same lab as LightRAG)

Query complex documents with interleaved text, visuals, tables, formulas

Overview

RAG-Anything solves one of the most persistent frustrations in production RAG deployments: most real-world documents aren’t plain text. Annual reports have tables. Research papers have formulas. Technical manuals have diagrams. Standard text-based RAG pipelines handle these poorly or not at all.

RAG-Anything, from HKUDS (the Data Intelligence Lab at Hong Kong University — the same team behind LightRAG), processes all these modalities through a single integrated framework. You upload a document with mixed content and query it naturally; the system handles the complexity of extracting, indexing, and retrieving across content types.

Technical Architecture

Stage 1 — Multimodal parsing: Separate modules handle text extraction, image analysis (vision models), mathematical formula recognition (LaTeX parsing), and table parsing (structured data extraction). Each content type is processed by a specialized module, then unified into a common representation.

Stage 2 — Cross-modal knowledge graph construction: Entities and relationships are extracted across all content types and organized into a knowledge graph. A table mentioning the same entity as a caption on an adjacent figure creates a graph edge linking them. This is the key innovation: retrieval can traverse relationships across modalities, not just within them.

Stage 3 — Hybrid retrieval: Queries use combined graph retrieval (following entity-relationship paths) and vector retrieval (semantic similarity). The hybrid approach captures both structured knowledge (graph) and fuzzy semantic relevance (vector).

Use Cases

RAG-Anything is strongest for document-heavy knowledge bases: scientific papers (heavy on formulas and figures), financial reports (tables and text interleaved), legal documents (structured sections with references), and technical documentation (code, diagrams, and prose).

Research Lineage

RAG-Anything comes from the same lab that produced LightRAG (EMNLP 2025), which has become one of the most-cited RAG frameworks of 2025. HKUDS has a strong track record of translating academic RAG research into practical, deployable systems.

RAG-Anything

About RAG-Anything

Key Features

Overview

Technical Architecture

Use Cases

Research Lineage

Similar Agents

autoresearch

Consensus

Elicit