Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering (2410.18050v2)

Published 23 Oct 2024 in cs.CL
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering

Abstract: Long-Context Question Answering (LCQA), a challenging task, aims to reason over long-context documents to yield accurate answers to questions. Existing long-context LLMs for LCQA often struggle with the "lost in the middle" issue. Retrieval-Augmented Generation (RAG) mitigates this issue by providing external factual evidence. However, its chunking strategy disrupts the global long-context information, and its low-quality retrieval in long contexts hinders LLMs from identifying effective factual details due to substantial noise. To this end, we propose LongRAG, a general, dual-perspective, and robust LLM-based RAG system paradigm for LCQA to enhance RAG's understanding of complex long-context knowledge (i.e., global information and factual details). We design LongRAG as a plug-and-play paradigm, facilitating adaptation to various domains and LLMs. Extensive experiments on three multi-hop datasets demonstrate that LongRAG significantly outperforms long-context LLMs (up by 6.94%), advanced RAG (up by 6.16%), and Vanilla RAG (up by 17.25%). Furthermore, we conduct quantitative ablation studies and multi-dimensional analyses, highlighting the effectiveness of the system's components and fine-tuning strategies. Data and code are available at https://github.com/QingFei1/LongRAG.

LongRAG: A Comprehensive Approach to Long-Context Question Answering

The paper "LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering" tackles the significant challenge of Long-Context Question Answering (LCQA). LCQA requires effectively processing extensive documents to provide precise answers to queries. Existing approaches using LLMs face limitations, including the "lost in the middle" issue, where models struggle to retrieve relevant information that is not positioned at the start or end of documents.

Contribution

The primary contribution of this paper is the introduction of LongRAG, a robust paradigm designed to improve retrieval-augmented generation (RAG) systems in understanding and processing long-context data. The work stands out by addressing two primary limitations of traditional RAG systems:

  1. Inadequate Chunking Strategy: Conventional chunking methods can disrupt global contextual understanding, causing models to miss critical connections between facts spread across the text.
  2. Noise Management: High noise levels within long documents make it difficult for LLMs to extract meaningful information accurately.

System Overview

LongRAG presents a novel architecture composed of four key components, ensuring the effective processing of long-context documents:

  • Hybrid Retriever: Utilizes a dual-encoder and cross-encoder setup for efficient and accurate retrieval.
  • LLM-augmented Information Extractor: This component regenerates global context information from retrieved chunks, thus preserving semantic coherence and facilitating comprehensive information extraction.
  • CoT-guided Filter: Employs Chain of Thought (CoT) reasoning to dynamically assess chunk relevance and filter out non-essential content, enhancing the density of evidence used in answer generation.
  • LLM-augmented Generator: Integrates insights from global context and factual detail to produce accurate answers.

Experimental Validation

The paper validates LongRAG through rigorous experimentation on three multi-hop datasets from LongBench, demonstrating its superior performance. Key findings include:

  • Performance Gains: LongRAG achieves significant improvements over baseline models, with increases of up to 6.94% compared to long-context LLMs, 6.16% over advanced RAG systems, and 17.25% relative to Vanilla RAG.
  • Robustness and Flexibility: Ablation studies confirm the efficacy of individual components and underscore the system's robustness across various long-context scenarios.
  • Efficiency: LongRAG maintains high performance while reducing token input to the generator, highlighting an efficient processing approach with minimal redundancy.

Implications and Future Prospects

Practically, LongRAG's design as a plug-and-play system allows for broad adaptability across different domains and compatibility with various LLMs, increasing its applicability in diverse real-world scenarios. Theoretically, the dual-perspective retrieval strategy marks a significant step forward in RAG methodologies, suggesting potential new avenues for research into complex information retrieval and generation tasks.

Future research could focus on exploring adaptive multi-round retrieval strategies to further enhance component interactions within dynamic information landscapes. Moreover, a focus on cross-domain transferability and performance measurement could solidify LongRAG's utility in other AI and NLP applications.

Conclusion

Overall, LongRAG emerges as a robust framework advancing the state-of-the-art in LCQA by integrating retrieval and generation components through a novel dual-perspective approach. This work contributes significantly to ongoing efforts aimed at refining LLM capabilities in handling extensive, complex informational contexts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Gemini: A family of highly capable multimodal models. CoRR, abs/2312.11805.
  2. Self-rag: Learning to retrieve, generate, and critique through self-reflection. CoRR, abs/2310.11511.
  3. Qwen technical report. CoRR, abs/2309.16609.
  4. Longalign: A recipe for long context alignment of large language models. CoRR, abs/2401.18058.
  5. Longbench: A bilingual, multitask benchmark for long context understanding. arXiv preprint arXiv:2308.14508.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  7. Recurrent memory transformer. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  8. Long context question answering via supervised contrastive learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 2872–2879. Association for Computational Linguistics.
  9. Benchmarking large language models in retrieval-augmented generation. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, pages 17754–17762. AAAI Press.
  10. Extending context window of large language models via positional interpolation. CoRR, abs/2306.15595.
  11. Longlora: Efficient fine-tuning of long-context large language models. CoRR, abs/2309.12307.
  12. Hallucination detection: Robustly discerning reliable answers in large language models. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM 2023, Birmingham, United Kingdom, October 21-25, 2023, pages 245–255. ACM.
  13. Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
  14. Flashattention: Fast and memory-efficient exact attention with io-awareness. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  15. A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 4599–4610. Association for Computational Linguistics.
  16. Longrope: Extending LLM context window beyond 2 million tokens. CoRR, abs/2402.13753.
  17. A survey on long text modeling with transformers. CoRR, abs/2302.14502.
  18. GLM: general language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 320–335. Association for Computational Linguistics.
  19. Retrieval-augmented generation for large language models: A survey. CoRR, abs/2312.10997.
  20. Re2g: Retrieve, rerank, generate. CoRR, abs/2207.06300.
  21. Retrieval augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 3929–3938. PMLR.
  22. Lm-infinite: Simple on-the-fly length generalization for large language models. CoRR, abs/2308.16137.
  23. Rethinking with retrieval: Faithful large language model inference. CoRR, abs/2301.00303.
  24. Constructing A multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, pages 6609–6625. International Committee on Computational Linguistics.
  25. Gautier Izacard and Edouard Grave. 2021. Distilling knowledge from reader to retriever for question answering. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  26. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. CoRR, abs/2310.06839.
  27. Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 7969–7992. Association for Computational Linguistics.
  28. LLM maybe longlm: Self-extend LLM context window without tuning. CoRR, abs/2401.01325.
  29. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
  30. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 15696–15707. PMLR.
  31. Bridging the preference gap between retrievers and llms. CoRR, abs/2401.06954.
  32. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  33. Are chatgpt and GPT-4 general-purpose solvers for financial text analytics? A study on several typical tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023 - Industry Track, Singapore, December 6-10, 2023, pages 408–422. Association for Computational Linguistics.
  34. Compressing context to enhance inference efficiency of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 6342–6353. Association for Computational Linguistics.
  35. RA-DIT: retrieval-augmented dual instruction tuning. CoRR, abs/2310.01352.
  36. Lost in the middle: How language models use long contexts. Trans. Assoc. Comput. Linguistics, 12:157–173.
  37. Longheads: Multi-head attention is secretly a long context processor. CoRR, abs/2402.10685.
  38. Yarn: Efficient context window extension of large language models. CoRR, abs/2309.00071.
  39. Grounding language model with chunking-free in-context retrieval. CoRR, abs/2402.09760.
  40. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, pages 3505–3506. ACM.
  41. In chatgpt we trust? measuring and characterizing the reliability of chatgpt. CoRR, abs/2304.08979.
  42. Ravi Theja. 2023. Evaluating the ideal chunk size for a rag system using llamaindex.
  43. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  44. Musique: Multihop questions via single-hop question composition. Trans. Assoc. Comput. Linguistics, 10:539–554.
  45. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 10014–10037. Association for Computational Linguistics.
  46. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  47. Efficient streaming language models with attention sinks. CoRR, abs/2309.17453.
  48. Corrective retrieval augmented generation. CoRR, abs/2401.15884.
  49. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 2369–2380. Association for Computational Linguistics.
  50. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  51. Making retrieval-augmented language models robust to irrelevant context. CoRR, abs/2310.01558.
  52. GLM-130B: an open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  53. GLM-130B: an open bilingual pre-trained model. CoRR, abs/2210.02414.
  54. Soaring from 4k to 400k: Extending llm’s context with activation beacon. CoRR, abs/2401.03462.
  55. RAFT: adapting language model to domain specific RAG. CoRR, abs/2403.10131.
  56. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219.
  57. A survey of large language models. CoRR, abs/2303.18223.
  58. Judging llm-as-a-judge with mt-bench and chatbot arena. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  59. LIMA: less is more for alignment. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  60. Open-source large language models are strong zero-shot query likelihood models for document ranking. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023, pages 8807–8817. Association for Computational Linguistics.
  61. Chatgpt hallucinates when attributing answers. In Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, SIGIR-AP 2023, Beijing, China, November 26-28, 2023, pages 46–51. ACM.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Qingfei Zhao (5 papers)
  2. Ruobing Wang (16 papers)
  3. Yukuo Cen (19 papers)
  4. Daren Zha (5 papers)
  5. Shicheng Tan (5 papers)
  6. Yuxiao Dong (119 papers)
  7. Jie Tang (302 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com