Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval (2401.18059v1)

Published 31 Jan 2024 in cs.CL and cs.LG
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Abstract: Retrieval-augmented LLMs can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy.

Introduction to RAPTOR

Retrieval-augmented LLMs (LMs) have become instrumental in enhancing model performance by supplementing their massive pre-encoded knowledge with data drawn from external corpora. Until now, standard retrieval methods have focused on obtaining short, sequential text snippets without capturing the full document context. The innovative RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) model addresses this limitation by constructing a tree with recursive embeddings, clustering, and summarization, enabling it to retrieve information across lengthy documents at multiple abstraction levels. This research underscores the significance of RAPTOR's ability to fetch contextually rich, ambiguous content spanning discrete document portions.

Comparison with Existing Methods

RAPTOR's contribution is remarkable—it extends the current capabilities of retrieval systems by delivering state-of-the-art performance, particularly in complex, multi-step reasoning tasks. Empirical research exhibits that integrating RAPTOR with current LMs, such as GPT-4, led to a 20% increase in absolute accuracy on benchmarks such as QuALITY. These tasks require comprehensive document understanding, necessitating the integration of knowledge from disparate text parts. RAPTOR's recursive summarization enables the LLM's context retrieval at differentiated granularity, outperforming existing retrieval-augmented methods.

RAPTOR's Technical Concept

The process of building the RAPTOR retrieval tree starts with segmenting text into short chunks embedded using sentence-BERT (SBERT). These chunks are then clustered based on their embeddings using a Gaussian Mixture Model (GMM), and a LLM subsequently summarizes these clusters. The distinct tree structure is progressively built by reembedding and clustering these summaries until clustering is no longer viable. Two key querying methods are employed: tree traversal and collapsed tree. The latter, notably superior, collapses the tree into a single layer and retrieves nodes until a token threshold is met, ensuring adherence to model input size constraints.

Superiority and Scalability of RAPTOR

RAPTOR's methodological prowess is evident through its performance on various datasets. It consistently outperformed baselines across the board with different retrieval systems and LLMs. Notably, RAPTOR aligned with GPT-4 shows a significant leap in F-1 Match scores indicating its superiority. Moreover, analyses confirm that nodes from various layers of the tree are crucial, as full tree querying commands better results than restricted layer-specific retrievals. As for scalability, RAPTOR exhibits linear scalability in build time and token expenditure, rendering it capable of processing large, complex corpora efficiently.

Conclusion and Outlook

RAPTOR sets a new standard in retrieval-augmented LM systems. It adeptly navigates the innate complexity of providing precise context at varying abstraction levels, thus enhancing question-answering capabilities. The model is a testament to the synergetic potential of recursive summarization and structured context retrieval. RAPTOR not only elevates the existing framework of LMs but also demonstrates significant advances in accuracy and efficiency. With the advent of RAPTOR, fine-tuned retrieval processes promise a deeper understanding of textual nuances, pushing the frontiers of generative AI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory—ICDT 2001: 8th International Conference London, UK, January 4–6, 2001 Proceedings 8, pp.  420–434. Springer, 2001. URL https://link.springer.com/chapter/10.1007/3-540-44503-x_27.
  2. CoLT5: Faster long-range transformers with conditional computation. arXiv preprint arXiv:2303.09752, 2023. URL https://arxiv.org/abs/2303.09752.
  3. Towards tracing knowledge in language models back to the training data. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  2429–2446, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-emnlp.180. URL https://aclanthology.org/2022.findings-emnlp.180.
  4. Summarizing opinions: Aspect extraction meets sentiment prediction and they are both weakly supervised. arXiv preprint arXiv:1808.08858, 2018. URL https://arxiv.org/abs/1808.08858.
  5. Hybrid hierarchical retrieval for open-domain question answering. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.  10680–10689, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.679. URL https://aclanthology.org/2023.findings-acl.679.
  6. Longformer: The Long-document Transformer, 2020. URL https://arxiv.org/abs/2004.05150. arXiv preprint arXiv:2004.05150.
  7. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pp.  2206–2240. PMLR, 2022. URL https://arxiv.org/abs/2112.04426.
  8. Language Models are Few-Shot Learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  9. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. arXiv preprint arXiv:2303.12712, 2023. URL https://arxiv.org/abs/2303.12712.
  10. Shuyang Cao and Lu Wang. HIBRIDS: Attention with hierarchical biases for structure-aware long document summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  786–807, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.58. URL https://aclanthology.org/2022.acl-long.58.
  11. Reading Wikipedia to Answer Open-Domain Questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1870–1879, Vancouver, Canada, July 2017. Association for Computational Linguistics. doi: 10.18653/v1/P17-1171. URL https://aclanthology.org/P17-1171.
  12. PaLM: Scaling Language Modeling with Pathways. arXiv preprint arXiv:2204.02311, 2022. URL https://arxiv.org/abs/2204.02311.
  13. Contextualizing citations for scientific summarization using word embeddings and domain knowledge. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.  1133–1136, 2017. URL https://dl.acm.org/doi/abs/10.1145/3077136.3080740.
  14. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  2978–2988, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1285. URL https://aclanthology.org/P19-1285.
  15. FlashAttention: Fast and memory-efficient exact attention with IO-Awareness. Advances in Neural Information Processing Systems, 35:16344–16359, 2022. URL https://arxiv.org/abs/2205.14135.
  16. A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  4599–4610, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.365. URL https://aclanthology.org/2021.naacl-main.365.
  17. CoLISA: Inner Interaction via Contrastive Learning for Multi-choice Reading Comprehension. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part I, pp.  264–278. Springer, 2023a. URL https://link.springer.com/chapter/10.1007/978-3-031-28244-7_17.
  18. A survey on long text modeling with transformers. arXiv preprint arXiv:2302.14502, 2023b. URL https://arxiv.org/abs/2302.14502.
  19. Enabling large language models to generate text with citations. arXiv preprint arXiv:2305.14627, 2023. URL https://arxiv.org/abs/2305.14627.
  20. LongT5: Efficient text-to-text transformer for long sequences. In Findings of the Association for Computational Linguistics: NAACL 2022, pp.  724–736, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-naacl.55. URL https://aclanthology.org/2022.findings-naacl.55.
  21. Retrieval Augmented Language Model Pre-Training. In International conference on machine learning, pp.  3929–3938. PMLR, 2020. URL https://doi.org/10.48550/arXiv.2002.08909.
  22. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022. URL https://arxiv.org/abs/2203.15556.
  23. Distilling Knowledge from Reader to Retriever for Question Answering, 2022. URL https://arxiv.org/abs/2012.04584. arXiv preprint arXiv:2012.04584.
  24. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022. URL https://arxiv.org/abs/2208.03299.
  25. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438, 2020. URL https://arxiv.org/abs/1911.12543.
  26. Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019. URL https://arxiv.org/abs/1702.08734.
  27. Large Language Models struggle to learn Long-Tail Knowledge. In International Conference on Machine Learning, pp.  15696–15707. PMLR, 2023. URL https://proceedings.mlr.press/v202/kandpal23a/kandpal23a.pdf.
  28. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  6769–6781, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.550. URL https://aclanthology.org/2020.emnlp-main.550.
  29. UNIFIEDQA: Crossing format boundaries with a single QA system. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp.  1896–1907, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.171. URL https://aclanthology.org/2020.findings-emnlp.171.
  30. ColBERT: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp.  39–48, 2020. URL https://arxiv.org/abs/2004.12832.
  31. The NarrativeQA Reading Comprehension Challenge. Transactions of the Association for Computational Linguistics, 6:317–328, 2018. URL https://arxiv.org/abs/1712.07040.
  32. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020. URL https://doi.org/10.48550/arXiv.2005.11401.
  33. Jerry Liu. LlamaIndex, 2022. URL https://github.com/jerryjliu/llama_index.
  34. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023. URL https://arxiv.org/abs/2307.03172.
  35. Dense hierarchical retrieval for open-domain question answering. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, pp.  188–200, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.19. URL https://aclanthology.org/2021.findings-emnlp.19.
  36. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, 2018. URL https://arxiv.org/abs/1802.03426. arXiv preprint arXiv:1802.03426.
  37. Joint passage ranking for diverse multi-answer retrieval. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  6997–7008, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.560. URL https://aclanthology.org/2021.emnlp-main.560.
  38. Nonparametric masked language modeling. In Findings of the Association for Computational Linguistics: ACL 2023, pp.  2097–2118, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-acl.132. URL https://aclanthology.org/2023.findings-acl.132.
  39. Memory-based model editing at scale. In International Conference on Machine Learning, pp.  15817–15831. PMLR, 2022. URL https://proceedings.mlr.press/v162/mitchell22a/mitchell22a.pdf.
  40. Frustratingly hard evidence retrieval for QA over books. In Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events, pp.  108–113, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.nuse-1.13. URL https://aclanthology.org/2020.nuse-1.13.
  41. A neural CRF-based hierarchical approach for linear text segmentation. In Findings of the Association for Computational Linguistics: EACL 2023, pp.  883–893, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-eacl.65. URL https://aclanthology.org/2023.findings-eacl.65.
  42. A controllable qa-based framework for decontextualization. arXiv preprint arXiv:2305.14772, 2023. URL https://arxiv.org/pdf/2305.14772.pdf.
  43. OpenAI. GPT-4 Technical Report. ArXiv, abs/2303.08774, 2023. URL https://arxiv.org/abs/2303.08774.
  44. QuALITY: Question Answering with Long Input Texts, Yes! In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  5336–5358, Seattle, United States, July 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.naacl-main.391.
  45. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019. URL https://arxiv.org/abs/1909.01066.
  46. Scaling language models: Methods, Analysis & Insights from Training Gopher. arXiv preprint arXiv:2112.11446, 2021. URL https://arxiv.org/abs/2112.11446.
  47. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083, 2023. URL https://arxiv.org/abs/2302.00083.
  48. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  3982–3992, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1410. URL https://aclanthology.org/D19-1410.
  49. How Much Knowledge Can You Pack Into the Parameters of a Language Model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  5418–5426, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.437. URL https://aclanthology.org/2020.emnlp-main.437.
  50. The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4):333–389, 2009. URL https://doi.org/10.1561/1500000019.
  51. Okapi at TREC-3. Nist Special Publication Sp, 109:109, 1995. URL https://www.microsoft.com/en-us/research/publication/okapi-at-trec-3/.
  52. Questions are all you need to train a dense passage retriever. Transactions of the Association for Computational Linguistics, 11:600–616, 2023. doi: 10.1162/tacl˙a˙00564. URL https://aclanthology.org/2023.tacl-1.35.
  53. Gideon Schwarz. Estimating the Dimension of a Model. The annals of statistics, pp.  461–464, 1978. URL https://projecteuclid.org/journals/annals-of-statistics/volume-6/issue-2/Estimating-the-Dimension-of-a-Model/10.1214/aos/1176344136.full.
  54. Karen Spärck Jones. A Statistical Interpretation of Term Specificity and its Application in Retrieval. Journal of documentation, 28(1):11–21, 1972. URL https://doi.org/10.1108/eb026526.
  55. Do long-range language models actually use long-range context? In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  807–822, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.62. URL https://aclanthology.org/2021.emnlp-main.62.
  56. Recitation-augmented language models. arXiv preprint arXiv:2210.01296, 2022. URL https://arxiv.org/abs/2210.01296.
  57. oLMpics– on what language model pre-training captures. Transactions of the Association for Computational Linguistics, 8:743–758, 2020. URL https://arxiv.org/abs/1912.13283.
  58. Shall we pretrain autoregressive language models with retrieval? a comprehensive study. arXiv preprint arXiv:2304.06762, 2023. URL https://arxiv.org/abs/2304.06762.
  59. Recursively Summarizing Books with Human Feedback, 2021. URL https://arxiv.org/abs/2109.10862.
  60. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension, 2018. URL https://arxiv.org/abs/1804.09541. arXiv preprint arXiv:1804.09541.
  61. Generate rather than retrieve: Large Language Models are strong context generators, 2022. URL https://arxiv.org/abs/2209.10063.
  62. Extractive is not faithful: An investigation of broad unfaithfulness problems in extractive summarization. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  2153–2174, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.120. URL https://aclanthology.org/2023.acl-long.120.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Parth Sarthi (1 paper)
  2. Salman Abdullah (2 papers)
  3. Aditi Tuli (1 paper)
  4. Shubh Khanna (2 papers)
  5. Anna Goldie (19 papers)
  6. Christopher D. Manning (169 papers)
Citations (70)
Youtube Logo Streamline Icon: https://streamlinehq.com