Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on Retrieval-Augmented Text Generation for Large Language Models (2404.10981v2)

Published 17 Apr 2024 in cs.IR, cs.AI, and cs.CL
A Survey on Retrieval-Augmented Text Generation for Large Language Models

Abstract: Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements to address the static limitations of LLMs by enabling the dynamic integration of up-to-date external information. This methodology, focusing primarily on the text domain, provides a cost-effective solution to the generation of plausible but possibly incorrect responses by LLMs, thereby enhancing the accuracy and reliability of their outputs through the use of real-world data. As RAG grows in complexity and incorporates multiple concepts that can influence its performance, this paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation, offering a detailed perspective from the retrieval viewpoint. It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies. Additionally, the paper introduces evaluation methods for RAG, addressing the challenges faced and proposing future research directions. By offering an organized framework and categorization, the study aims to consolidate existing research on RAG, clarify its technological underpinnings, and highlight its potential to broaden the adaptability and applications of LLMs.

Examining the Evolution and Mechanisms of Retrieval-Augmented Generation (RAG) for Enhancing LLMs

Introduction

The paper explores the advancements and methodologies of Retrieval-Augmented Generation (RAG), focusing on its role in overcoming the limitations of LLMs due to static training datasets. By integrating dynamic, up-to-date external information, RAG addresses response accuracy issues in LLMs, such as poor performance in specialized domains and "hallucinations" of plausible but incorrect answers. The analysis spans the pre-retrieval, retrieval, post-retrieval, and generation stages, offering a comprehensive framework for RAG application in text-based AI systems, with insights into multimodal applications.

RAG Implementation Framework

The four-phase structure of RAG implementation comprises:

  • Pre-Retrieval: Operations like indexing and query manipulation prepare the system for efficient information retrieval.
  • Retrieval: This phase employs methods to search and rank data relevant to the input query, leveraging both traditional techniques like BM25 and newer semantic-oriented models such as BERT.
  • Post-Retrieval: Involves re-ranking and filtering to optimize the selection of retrieved content for the generation phase.
  • Generation: The final text output is generated, merging retrieved information with the original query to produce accurate and contextually appropriate responses.

Evaluation of RAG Systems

The paper discusses evaluation methods focusing on:

  • Segmented Analysis: Examining retrieval and generation components individually to assess performance accurately in relevant tasks such as question answering.
  • End-to-End Evaluation: Evaluating the system's overall performance to ensure the coherence and correctness of generated responses.

Impact and Theoretical Implications

The integration of retrieval mechanisms within LLMs presents both practical and theoretical implications:

  • Practical Applications: Enhancing the adaptability of LLMs in various domains by integrating real-time data, thus maintaining their relevance over time.
  • Theoretical Advancements: RAG prompts re-evaluation of traditional LLM architectures, proposing hybrid models that dynamically interact with external data sources.

Future Prospects and Developments

Looking ahead, the paper suggests expanding RAG applications beyond text to include multimodal data, which could revolutionize areas like interactive AI and automated content creation. Advances in retrieval methods, such as differentiable search indices and integration of generative models, hold promise for further enhancing the precision and efficiency of RAG systems.

Conclusion

This paper provides a structured examination and categorization of RAG methodologies, offering a detailed analysis from a retrieval perspective. By discussing RAG's evolution, categorizing its mechanisms, and highlighting its impact, this paper serves as a crucial resource for researchers aiming to advance the functionality and application of LLMs through retrieval-augmented technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (117)
  1. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv, abs/2310.11511.
  2. Optimizing Retrieval-augmented Reader Models via Token Elimination. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1506–1524. Association for Computational Linguistics.
  3. Autoregressive search engines: Generating substrings as document identifiers. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  4. Autoregressive Search Engines: Generating Substrings as Document Identifiers. In Conference on Neural Information Processing Systems (NeurIPS).
  5. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow.
  6. Improving Language Models by Retrieving from Trillions of Tokens. In International Conference on Machine Learning (ICML), pages 2206–2240.
  7. Language Models are Few-Shot Learners. In Conference on Neural Information Processing Systems (NeurIPS), volume abs/2005.14165.
  8. Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading. arXiv, abs/2310.05029.
  9. Gere: Generative Evidence Retrieval for Fact Verification. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.
  10. Benchmarking large language models in retrieval-augmented generation. arXiv, abs/2309.01431.
  11. Evaluating large language models trained on code. arXiv, abs/2107.03374.
  12. Murag: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  13. Re-Imagen: Retrieval-Augmented Text-to-Image Generator. In International Conference on Learning Representations (ICLR).
  14. Phoenix: Democratizing chatgpt across languages. arXiv, abs/2304.10453.
  15. Uprise: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12318–12337. Association for Computational Linguistics.
  16. Lift Yourself Up: Retrieval-augmented Text Generation with Self-Memory. In Thirty-seventh Conference on Neural Information Processing Systems, volume abs/2305.02437.
  17. Palm: Scaling Language Modeling with Pathways. Journal of Machine Learning Research (JMLR), 24:240:1–240:113.
  18. Scaling Instruction-Finetuned Language Models. arXiv, abs/2210.11416.
  19. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451. Association for Computational Linguistics.
  20. The Power of Noise: Redefining Retrieval for RAG Systems. arXiv, abs/2401.14887.
  21. Promptagator: Few-shot Dense Retrieval From 8 Examples. In International Conference on Learning Representations (ICLR).
  22. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North, pages 4171–4186. Association for Computational Linguistics.
  23. Wizard of Wikipedia: Knowledge-Powered Conversational Agents. In International Conference on Learning Representations (ICLR).
  24. Glm: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics.
  25. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples. In International Conference on Language Resources and Evaluation (LREC).
  26. Retrieval-generation synergy augmented large language models. arXiv, abs/2310.05149.
  27. Edward A. Fox and Joseph A. Shaw. 1994. Combination of multiple searches. In TREC-2: Text retrieval conference, 500215, pages 105–108.
  28. Simcse: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910. Association for Computational Linguistics.
  29. Re2g: Retrieve, Rerank, Generate. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2701–2715. Association for Computational Linguistics.
  30. Simon Gottschalk and Elena Demidova. 2018. EventKG: A Multilingual Event-Centric Temporal Knowledge Graph. Springer International Publishing.
  31. Retrieval Augmented Language Model Pre-Training. In International Conference on Machine Learning (ICML), pages 3929–3938.
  32. William L. Hamilton. 2020. Graph representation learning. Springer International Publishing.
  33. Okapi at TREC-5. In Proceedings of The Fifth Text REtrieval Conference, TREC 1996, Gaithersburg, Maryland, USA, November 20-22, 1996, volume 500-238 of NIST Special Publication. National Institute of Standards and Technology (NIST).
  34. Fid-light: Efficient and effective retrieval-augmented text generation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1437–1447.
  35. Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 23369–23379. IEEE.
  36. Understanding Jargon: Combining Extraction and Generation for Definition Modeling. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  37. High performance query expansion using adaptive co-training. Inf. Process. Manag., 49(2):441–453.
  38. Retrieval Augmented Generation with Rich Answer Encoding. Proc. of IJCNLP-AACL, 2023.
  39. Xiangji Huang and Qinmin Hu. 2009. A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, July 19-23, 2009, pages 307–314. ACM.
  40. Yizheng Huang and Jimmy Huang. 2024. Exploring chatgpt for next-generation information retrieval: Opportunities and challenges. CoRR, abs/2402.11203.
  41. Yizheng Huang and Jimmy X. Huang. 2023. Diversified prior knowledge enhanced general language model for biomedical information retrieval. In ECAI 2023 - 26th European Conference on Artificial Intelligence, September 30 - October 4, 2023, Kraków, Poland - Including 12th Conference on Prestigious Applications of Intelligent Systems (PAIS 2023), volume 372 of Frontiers in Artificial Intelligence and Applications, pages 1109–1115. IOS Press.
  42. Unsupervised Dense Information Retrieval with Contrastive Learning. Transactions on Machine Learning Research (TMLR), 2022.
  43. Gautier Izacard and Edouard Grave. 2021. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880. Association for Computational Linguistics.
  44. Atlas: Few-shot Learning with Retrieval Augmented Language Models. Journal of Machine Learning Research (JMLR), 24:251:1–251:43.
  45. Evaluation of chatgpt on biomedical tasks: A zero-shot comparison with fine-tuned generative transformers. CoRR, abs/2306.04504.
  46. Patterns of query reformulation during web searching. J. Assoc. Inf. Sci. Technol., 60(7):1358–1371.
  47. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12):248:1–248:38.
  48. Active Retrieval Augmented Generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7969–7992.
  49. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.
  50. Triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611. Association for Computational Linguistics.
  51. Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation. arXiv, abs/2305.18846.
  52. Dense Passage Retrieval for Open-Domain Question Answering. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781.
  53. Generalization through Memorization: Nearest Neighbor Language Models. In International Conference on Learning Representations (ICLR).
  54. Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP. arXiv, abs/2212.14024.
  55. Omar Khattab and Matei Zaharia. 2020. Colbert - Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 39–48. ACM.
  56. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics, 7:453–466.
  57. A systematic study and comprehensive evaluation of chatgpt on benchmark datasets. CoRR, abs/2305.18486.
  58. Query focused abstractive summarization via incorporating query relevance and transfer learning with transformer models. In Advances in Artificial Intelligence - 33rd Canadian Conference on Artificial Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13-15, 2020, Proceedings, volume 12109 of Lecture Notes in Computer Science, pages 342–348. Springer.
  59. Bart: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880. Association for Computational Linguistics.
  60. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Conference on Neural Information Processing Systems (NeurIPS).
  61. Parade: Passage Representation Aggregation forDocument Reranking. ACM Transactions on Information Systems, 42(2):1–26.
  62. A Survey on Retrieval-Augmented Text Generation. arXiv, abs/2202.01110.
  63. Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources. arXiv.
  64. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  65. How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6385–6400. Association for Computational Linguistics.
  66. Ra-dit: Retrieval-augmented dual instruction tuning. arXiv, abs/2310.01352.
  67. Few-shot Learning with Multilingual Generative Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  68. Recall: A Benchmark for LLMs Robustness against External Counterfactual Knowledge. arXiv, abs/2311.08147.
  69. Augmented Large Language Models with Parametric Knowledge Guiding. arXiv, abs/2305.04757.
  70. Query Rewriting in Retrieval-Augmented Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5303–5315. Association for Computational Linguistics.
  71. Yu A. Malkov and D. A. Yashunin. 2020. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836.
  72. Introduction to Information Retrieval. Cambridge University Press.
  73. Webgpt: Browser-assisted question-answering with human feedback. arXiv, abs/2112.09332.
  74. Large Dual Encoders Are Generalizable Retrievers. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9844–9855. Association for Computational Linguistics.
  75. Multi-stage document ranking with BERT. CoRR, abs/1910.14424.
  76. Gpt-4 Technical Report. PREPRINT.
  77. Training language models to follow instructions with human feedback. In Conference on Neural Information Processing Systems (NeurIPS).
  78. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 311–318, USA. Association for Computational Linguistics.
  79. Instruction tuning with gpt-4. arXiv.
  80. Kilt: a Benchmark for Knowledge Intensive Language Tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2523–2544. Association for Computational Linguistics.
  81. Filip Radlinski and Nick Craswell. 2010. Comparing the sensitivity of information retrieval metrics. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, page 667–674, New York, NY, USA. Association for Computing Machinery.
  82. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research (JMLR), 21:140:1–140:67.
  83. In-Context Retrieval-Augmented Language Models. Transactions of the Association for Computational Linguistics, 11:1316–1331.
  84. Learning to Retrieve Passages without Supervision. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
  85. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3980–3990. Association for Computational Linguistics.
  86. Stephen Robertson and Hugo Zaragoza. 2009. The Probabilistic Relevance Framework: Bm25 and Beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  87. Ares: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems. arXiv, abs/2311.09476.
  88. Retrieval-Augmented Transformer for Image Captioning. In International Conference on Content-based Multimedia Indexing. ACM.
  89. Ragas: Automated Evaluation of Retrieval Augmented Generation. arXiv, abs/2309.15217.
  90. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9248–9274. Association for Computational Linguistics.
  91. Replug: Retrieval-augmented black-box language models. arXiv, abs/2301.12652.
  92. Recitation-Augmented Language Models. In International Conference on Learning Representations (ICLR).
  93. Ul2: Unifying Language Learning Paradigms. In International Conference on Learning Representations (ICLR).
  94. Transformer Memory as a Differentiable Search Index. In Conference on Neural Information Processing Systems (NeurIPS).
  95. Fever: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 809–819. Association for Computational Linguistics.
  96. Llama: Open and efficient foundation language models. arXiv, abs/2302.13971.
  97. Llama 2: Open foundation and fine-tuned chat models. arxiv, abs/2307.09288.
  98. Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10014–10037. Association for Computational Linguistics.
  99. Attention is All you Need. In Neural Information Processing Systems, pages 5998–6008.
  100. Superglue: A Stickier Benchmark for General-Purpose Language Understanding Systems. In Conference on Neural Information Processing Systems (NeurIPS), pages 3261–3275.
  101. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  102. Query2doc: Query Expansion with Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9414–9423. Association for Computational Linguistics.
  103. Knowledgpt: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases. arXiv, abs/2308.11761.
  104. Bloom: A 176b-parameter open-access multilingual language model. arXiv, abs/2211.05100.
  105. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In International Conference on Learning Representations (ICLR).
  106. Recomp: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation. arXiv, abs/2310.04408.
  107. Prca: Fitting Black-Box Large Language Models for Retrieval Question Answering via Pluggable Reward-Driven Contextual Adapter. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5364–5375. Association for Computational Linguistics.
  108. Hotpotqa: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380. Association for Computational Linguistics.
  109. Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11844–11857. Association for Computational Linguistics.
  110. Few-shot generative conversational query rewriting. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, pages 1933–1936. ACM.
  111. Generate rather than Retrieve: Large Language Models are Strong Context Generators. In International Conference on Learning Representations (ICLR).
  112. A survey of knowledge-enhanced text generation. ACM Comput. Surv., 54(11s):227:1–227:38.
  113. Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2421–2436. Association for Computational Linguistics.
  114. Retrieval-Augmented Text-to-Audio Generation. arXiv, abs/2309.08051.
  115. Opt: Open pre-trained transformer language models. arXiv, abs/2205.01068.
  116. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. arXiv, abs/2310.06117.
  117. Rankt5: Fine-Tuning T5 for Text Ranking with Ranking Losses. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yizheng Huang (13 papers)
  2. Jimmy Huang (9 papers)
Citations (21)
Youtube Logo Streamline Icon: https://streamlinehq.com