Hierarchical Trees in RAPTOR
- Hierarchical trees are structured models that recursively abstract and summarize data using techniques like GMM clustering and GPT-based summarization for efficient retrieval.
- In RAPTOR, documents are split into chunks, embedded with SBERT, and organized into a multi-level tree that enables precise, context-aware question answering.
- The HO-Tree in ST-Raptor extends these ideas to semi-structured tables by integrating meta and body trees to maintain structural integrity and improve answer accuracy.
Hierarchical trees in retrieval and question answering constitute a structured approach for organizing, abstracting, and querying complex information spaces, particularly in the RAPTOR framework for text documents (Sarthi et al., 2024) and the HO-Tree representation for semi-structured tables in ST-Raptor (Tang et al., 25 Aug 2025). These models utilize recursive or orthogonal tree structures to recursively abstract, summarize, and enable context-sensitive reasoning over large, heterogeneous corpora and tables, optimizing both accuracy and efficiency for retrieval-augmented LLMs.
1. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
The RAPTOR model applies hierarchical tree construction to lengthy documents, facilitating multi-range retrieval for LLMs. Let be a document of tokens. RAPTOR begins by segmenting into contiguous, sentence-respecting text “chunks” of at most 100 tokens. Each chunk is embedded using SBERT: , with .
At each recursive level , embeddings corresponding to nodes are clustered using a Gaussian Mixture Model (GMM). To address high-dimensionality, UMAP-based dimensionality reduction may optionally reduce . The GMM fits Gaussians to maximize
with chosen to minimize BIC:
where . Nodes are assigned soft cluster memberships via posteriors , with cluster sets for a threshold .
Child texts in each cluster are concatenated and summarized using GPT-3.5-turbo, generating new summary nodes . Each summary is embedded: . This process recurses until a single root node (level ) remains or clusters become too small to split. The resulting hierarchical summary tree enables retrieval at multiple abstraction levels.
2. Formal Tree Structure, Inference, and Query Algorithms
Formally, the tree comprises levels indexed . Level-0 nodes correspond to original text chunks, with each higher level built from GMM-clustered, abstractive summaries. A node at level has children .
Querying in RAPTOR proceeds by embedding the query to . Each node is scored by cosine similarity:
Two retrieval modes are defined:
- Tree traversal: Starting at the root (), top- nodes by score are selected, then recursively their children are scored at each lower level, selecting top- at each step. The context for the LLM is the union of texts from selected nodes across all levels.
- Collapsed-tree retrieval: All nodes across all levels are pooled; top nodes are chosen by score in descending order until a global token budget is reached.
Empirical evidence indicates collapsed-tree retrieval achieves higher answer accuracy, while tree traversal provides deterministic per-level quota and lower computational overhead when (with the total number of tree nodes), especially in large document scenarios (Sarthi et al., 2024).
3. Computational Complexity and Trade-Offs
The RAPTOR build time consists of for chunk embedding (where is per-encoder cost), per-level for clustering, and a summarization overhead proportional to LLM-invocation token counts per level. Empirically, wall-clock and token costs scale linearly with document length .
During retrieval:
- Flat-retrieval (baseline, e.g., DPR/BM25): for scoring/sorting.
- Tree traversal: , typically much less than for small and moderate .
- Collapsed-tree retrieval: or with to score and sort all nodes.
FAISS or approximate -NN can accelerate all approaches.
Key trade-offs are summarized in the following table:
| Retrieval Mode | Granularity | Speed | Accuracy (Empirical) |
|---|---|---|---|
| Collapsed-tree | Flexible | Moderate | Highest |
| Tree traversal | Fixed per-level | Fast (when ) | Lower (but scalable) |
Collapsed-tree retrieval is most accurate; tree traversal is optimal when deterministic quotas and speed are required (Sarthi et al., 2024).
4. Hierarchical Trees for Semi-Structured Tables: HO-Tree in ST-Raptor
ST-Raptor generalizes hierarchical tree frameworks to semi-structured tables, formulating the Hierarchical Orthogonal Tree (HO-Tree) representation (Tang et al., 25 Aug 2025). For a table , the HO-Tree is a triple
where:
- : Meta-Tree, representing headers and their hierarchical containment.
- : Body-Tree, representing content cells as paths (rows) in a row-oriented trie.
- : a pointer from each meta-tree leaf (resolved header) to a body-tree level.
This design encodes both hierarchical header structure and side-by-side orthogonal table sections, accommodating multi-row/column spans, arbitrary merged cells, and recursive subtables.
Algorithmically, HO-Tree construction proceeds by meta-information detection (via VLMs and embedding-based header identification), recursive table partitioning according to merged-cell and header orientation principles, and depth-first construction—producing a forest of HO-Trees under a synthetic root as needed. Each cell is processed times (with the total number of cells), and embedding dominates the cost at .
5. Operations and Pipelines over Hierarchical Trees
ST-Raptor exposes a formal language of atomic tree operations, composable into complex pipelines for LLM-guided question answering. The basic operations include:
- : Child subtree retrieval for meta-node .
- : Ancestor subtree retrieval for meta-node .
- : Value extraction, returning all body-nodes at meta-node ’s associated level with row-ancestor value .
- : Data filtering by predicate .
- : Numeric aggregation.
- : Set comparison.
- : Map operation.
- : Parameter alignment.
- : LLM-based reasoning on data for query .
Given a natural language question , ST-Raptor (1) decomposes into sub-questions using few-shot prompting and retrieved exemplars, (2) generates atomic operation statements for each, and (3) sequentially executes these over the HO-Tree, invoking forward and backward verification mechanisms to ensure correctness and stability of answers (Tang et al., 25 Aug 2025).
6. Empirical Results and Impact
Controlled experiments confirm the efficacy of the hierarchical tree approach. RAPTOR achieves state-of-the-art results on multi-step reasoning question answering tasks. Notably:
- On QuALITY (5k-token passages), DPR+GPT-4 yields 60.4% accuracy, RAPTOR+GPT-4 62.4% (+2.0 pp); on the QuALITY-HARD subset, performance improves from 54.7% to 56.6%.
- On QASPER, RAPTOR+GPT-4 obtains 55.7 F1 (DPR+GPT-4: 53.0 F1).
- On NarrativeQA, RAPTOR+UnifiedQA improves ROUGE-L, BLEU-1, and METEOR metrics by 1–0.7 pp over strong baselines.
Coupling RAPTOR retrieval with GPT-4 yields an absolute 20-point lift in QuALITY accuracy over the previous SOTA (62.3% to 82.6%), demonstrating RAPTOR’s utility for multi-step, thematic reasoning over long contexts (Sarthi et al., 2024).
ST-Raptor, leveraging the HO-Tree, attains up to 20% higher answer accuracy than nine other baselines in the semi-structured table setting, as measured on the SSTQA dataset with 764 questions across 102 real-world tables (Tang et al., 25 Aug 2025).
7. Significance and Generalizations
Hierarchical trees in RAPTOR and ST-Raptor provide an explicit, recursive abstraction of the underlying information space, decoupling summary granularity and query expressivity from fixed chunking strategies. This enables robust retrieval and reasoning, supporting complex question decomposition, abstraction, and compositional generalization. The models natively accommodate multi-modal, recursive, and thematic content, and their modular tree operations and layouts offer principled mechanisms for aligning LLM inference with the original data structure. This suggests that recursive trees may remain foundational for scalable, interpretable retrieval-augmented language modeling in both textual and tabular domains, particularly where multi-step reasoning and layout complexity are central.
Further developments may refine dynamic tree construction, LLM-guided pipeline generation, and cross-modal schema induction, leveraging the demonstrated empirical effectiveness and formal flexibility of hierarchical tree representations (Sarthi et al., 2024, Tang et al., 25 Aug 2025).