StructRAG: Structured RAG for LLMs

Updated 1 January 2026

StructRAG is a family of Retrieval-Augmented Generation frameworks that leverages structural representations like tables, graphs, and layouts to improve LLM reasoning.
It converts unstructured text into organized formats using hybrid routing, structured induction, and reinforcement learning for precise fact aggregation.
Evaluated on benchmarks such as SPIQA and 2WikiMultiHopQA, StructRAG consistently outperforms standard RAG systems on multi-hop and aggregative queries.

StructRAG refers to a family of Retrieval-Augmented Generation (RAG) frameworks for LLMs that incorporate explicit structural representations—such as tables, graphs, catalogues, and layout-aware graphs—at inference time or ingestion time, in order to enable robust, high-accuracy reasoning over knowledge-intensive or aggregative queries spanning large, heterogeneous corpora. These frameworks supersede standard RAG systems, which typically operate over unstructured text chunks appended for subsequent language modeling, by leveraging structural induction, hybrid structure selection, graph-based retrieval, and dynamic schema adaptation. Recent variants have demonstrated marked improvements in answer recall, exact match, and semantic reasoning metrics on public and bespoke benchmarks. StructRAG encompasses approaches as described in papers including "StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization" (Li et al., 2024), "Structure-R1: Dynamically Leveraging Structural Knowledge in LLM Reasoning through Reinforcement Learning" (Wu et al., 16 Oct 2025), "Retrieve-DocumentRoute-Read" (Xu et al., 5 Oct 2025), "Structured RAG for Answering Aggregative Questions" (Koshorek et al., 11 Nov 2025), and "SuperRAG: Beyond RAG with Layout-Aware Graph Modeling" (Yang et al., 28 Feb 2025).

1. Motivation and Limitations of Standard RAG

StructRAG arises from the observation that many knowledge-intensive and aggregative queries require gathering and synthesizing facts distributed across dozens or hundreds of documents, often with fine-grained filters or multi-step reasoning over a large context. Standard RAG systems are limited by:

Fixed-size retrieval sets (completeness bottleneck)
LLM context window size constraints (bounded context size)
Lack of structural awareness (e.g., inability to leverage document hierarchy, layout or table semantics)
Difficulty handling multi-hop, aggregative, or global reasoning when information is scattered or multimodal

Real-world examples include "What is the average revenue of companies founded before 2010?" or "Which hotel has the highest guest rating in Paris with ≥4 stars?" Classical RAG pipelines either retrieve a few top chunks (potentially omitting key records), concatenate them for generation, or rely on embeddings ill-suited for precise schema-aware queries (Koshorek et al., 11 Nov 2025).

2. Core Methodological Components

StructRAG frameworks share several innovations:

Hybrid Structure Router: An LLM-based policy module that, given a question and core document summaries, selects among several structure types (e.g., table, graph, algorithm, catalogue, chunk) using a scoring function and is trained via Direct Preference Optimization (DPO) (Li et al., 2024).
Scattered Knowledge Structurizer: Converts raw text into the selected structure format at inference or ingestion—serializing facts as tables, triple graphs, algorithms, or structured headings, depending on the router output (Li et al., 2024).
Structured Knowledge Utilizer: Decomposes the query into sub-questions, extracts relevant facts from the structural representation, and synthesizes the answer via further LLM prompting (Li et al., 2024).

More advanced systems, such as Structure-R1, employ reinforcement learning (Group Relative Policy Optimization, GRPO) to dynamically invent and adapt structural schemas per query, rather than restricting to a fixed format set (Wu et al., 16 Oct 2025). Structure verification via self-reward ensures that the generated structure encodes all necessary facts and can yield a correct answer in isolation.

3. Structural Induction, Representation, and Routing

Approaches vary in the timing and mechanism of structural induction:

Ingestion-time Structuring (S-RAG): Schema induction (via LLM) over a sample document/query set to predict a flat JSON schema. Each document is mapped (field-wise) to a structured tuple, aggregated into a relational table, with attribute statistics computed for SQL-guided inference (Koshorek et al., 11 Nov 2025).
Inference-time Structuring (StructRAG, Structure-R1): The system retrieves raw documents and reconstructs them into the optimal structure type for the query, motivated by cognitive fit and cognitive load principles (humans restructure information for efficient reasoning) (Li et al., 2024). In Structure-R1, format blocks (<format: FORMAT>...</format: FORMAT>) and chain-of-thought (> ...) blocks are interleaved, and the agent may invent new schemas mid-reasoning (Wu et al., 16 Oct 2025).
Document Structure Awareness (RDR2): An LLM router explicitly navigates hierarchical document trees (headings, passages), dynamically expanding and including or excluding sections using [EXPAND], [ANS], [REF] actions, providing a fine-grained document-routing mechanism for evidence acquisition (Xu et al., 5 Oct 2025).
Layout-Aware Graph Modeling (SuperRAG): Document layouts (PDF, figures, tables) are parsed into property graphs. Nodes represent text chunks, table cells, diagrams, etc., with edges encoding hierarchy, spatial adjacency, sequential order, and semantic links. Retrieval combines graph traversal, vector similarity, and full-text search over graph-embedded representations, often followed by GNN-based feature propagation (Yang et al., 28 Feb 2025).

4. Reasoning and Query Translation

With structural representations established, StructRAG approaches use formal query translation and structured reasoning:

NL→SQL Translation (S-RAG): For relational table-structured corpora, LLMs are prompted with the schema and statistics to produce SQL queries, which are executed over the table to yield direct answers (scalars or small result sets), then post-processed via LLM to generate the final textual output (Koshorek et al., 11 Nov 2025).
Sub-question Decomposition and Fact Extraction (StructRAG, Structure-R1): The utilizer module decomposes complex queries, identifies relevant sub-structures, and synthesizes answers over extracted facts. Structure-R1 alternates between reasoning and format construction, using self-reward to ensure structured blocks are correct and self-contained (Li et al., 2024, Wu et al., 16 Oct 2025).

Multi-hop or global reasoning is facilitated by the capacity to encode key relations explicitly and perform attribute-wise aggregation, filtering, or computation directly over structural datasets or layout graphs.

5. Benchmarks, Evaluation, and Empirical Performance

StructRAG methods have been evaluated on a spectrum of public and custom knowledge-intensive benchmarks:

Benchmark / Dataset	Best StructRAG Variant	Accuracy Gain over Baseline
HOTELS (Aggregative QA)	S-RAG-GoldSchema (Koshorek et al., 11 Nov 2025)	+43 pts over VectorRAG
WORLD CUP (Aggregative QA)	S-RAG-GoldSchema (Koshorek et al., 11 Nov 2025)	+18 pts over VectorRAG
DOCBENCH	SuperRAG (Yang et al., 28 Feb 2025)	+7.3 pts over flat RAG
SPIQA (Multimodal QA)	SuperRAG (Yang et al., 28 Feb 2025)	+9.0 pts on hardest partition
Loong Knowledge Tasks	StructRAG (Li et al., 2024)	+12–18 pts over RAG/Long-Context
2WikiMultiHopQA	Structure-R1 (Wu et al., 16 Oct 2025)	74.2 EM vs. 25.7 EM (StructRAG)
ASQA (Ambiguous QA)	RDR2 (Xu et al., 5 Oct 2025)	+4.4–4.5 EM over vanilla RAG

Answer Recall, Exact Match (EM), LLM Score, and variant-specific metrics (Claim Recall, head-to-head win-rate, F1_1–5, etc.) are used. Ablation studies demonstrate that hybrid structure selection, structure-aware routing, and generative structure adaptation each contribute significant accuracy gains; router accuracy directly correlates with overall performance (Li et al., 2024, Xu et al., 5 Oct 2025).

FullCorpus and chunk-only RAG baselines fail on aggregative queries due to context truncation and incomplete evidence selection. Router and structurizer modules resolve these limitations by expanding retrieval and conditionally mapping inputs to high-density structures.

6. Theoretical Foundations and Structural Reasoning

Structure-R1 grounds the StructRAG paradigm in formal analysis of information density and reasoning clarity:

Semantic information content $I(x)$ and density $\rho(x) = I(x)/|x|$ increase when documents are mapped to optimal structures, facilitating lower reasoning error.
RL-based structure adaptation maximizes expected information density and minimizes answer error across queries (Wu et al., 16 Oct 2025).
Generative schema freedom enables task-specific formats (e.g., <date_comparison>, <financial_timeline>) yielding denser, more transparent context than fixed schemas (Wu et al., 16 Oct 2025).

Document structure trees, layout graphs, and relational schemas provide explicit scaffolds for evidence aggregation and traversal, enabling not only completeness but also improved multimodal and hierarchical reasoning.

7. Limitations, Ongoing Challenges, and Future Directions

Current StructRAG methods assume moderately homogeneous corpora or a single recurring schema; multi-schema corpora and nested/list attributes remain challenging (Koshorek et al., 11 Nov 2025). Quality depends on accurate schema induction and layout parsing—errors propagate into final answers. Computational overhead for graph construction, GNN embedding, and structure selection is nontrivial (Yang et al., 28 Feb 2025).

Active research directions include:

Multi-schema induction (document clustering and multi-table ingestion)
Nested and one-to-many structural support via extended SQL/JSON or graph databases
End-to-end joint training of router, structurizer, and generator modules
Automated discovery of novel structure types (timelines, matrices, bespoke domain schemas)
Scaling to noisy, multimodal, and enterprise-scale corpora (legal, scientific, clinical domains)
Efficiency and memory optimization for very long document streams

In summary, StructRAG frameworks verify that hybrid, dynamic structuring—leveraging schema induction, document graph modeling, and structure-aware reasoning—significantly overcomes the completeness and reasoning bottlenecks of standard RAG. Empirical results and theoretical backing suggest explicit structure is critical for LLMs to realize advanced global, multi-step reasoning over large heterogeneous corpora (Li et al., 2024, Koshorek et al., 11 Nov 2025, Wu et al., 16 Oct 2025, Xu et al., 5 Oct 2025, Yang et al., 28 Feb 2025).