Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieval-Augmented Generation for Large Language Models: A Survey (2312.10997v5)

Published 18 Dec 2023 in cs.CL and cs.AI
Retrieval-Augmented Generation for Large Language Models: A Survey

Abstract: LLMs showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development.

Overview of Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) combines parametric knowledge from LLMs with non-parametric knowledge sourced externally to enhance the generation of text. By linking generated output to external data, RAG provides a verifiable foundation for the information provided, often reducing hallucination issues where models generate false information. RAG is adaptable, making it a powerful tool for providing up-to-date information and transparent outputs that can be traced back to the source material.

Paradigms of RAG

RAG development has undergone a transition from Naive RAG to more sophisticated paradigms including Advanced RAG and Modular RAG. Naive RAG involves a retrieval process that retrieves relevant documents, which a generator then uses to create text responses. Despite the effectiveness of this process, its limitations pave the way for Advanced RAG. Advanced RAG addresses these limitations by optimizing the retrieval process with methods such as pre-indexing optimization and refinement of the retrieval process with techniques like recursive retrieval.

Modular RAG further advances the concept by allowing the integration of various modules which can be reconfigured based on specific tasks, offering greater flexibility and efficiency.

Core Components and Evaluation of RAG

Research on RAG spans across retrievers and generators. The core focus is on fine-tuning both components to improve answer accuracy and relevance. For instance, RAG with iterative retrieval refines the retrieval process, potentially yielding more relevant and concise information which enhances LLM performance. In terms of evaluation, frameworks like RAGAS and ARES analyze RAG systems employing metrics such as Faithfulness, Relevance, and Context Recall to measure effectiveness.

Future Directions and Horizontal Expansion

RAG has potential for vertical optimization, such as addressing long context limitations and improving robustness. Horizontal expansion has seen RAG applied across diverse domains from images to code, showcasing its flexibility and applicability. Finally, the growth of the RAG ecosystem, including technical stacks and tools, points to a future where an all-encompassing RAG platform could be a reality, maximizing the synergy between parametric and non-parametric methods and aligning with engineering needs.

The continuous improvement and diversification of RAG use cases are likely to further its performance and practical applications, creating a more powerful tool in the landscape of generative AI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (111)
  1. Neuro-symbolic language modeling with automaton-augmented retrieval. In International Conference on Machine Learning, pages 468–485. PMLR, 2022.
  2. Lingua: Addressing scenarios for live interpretation and automatic dubbing. In Janice Campbell, Stephen Larocca, Jay Marciano, Konstantin Savenkov, and Alex Yanishevsky, editors, Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), pages 202–209, Orlando, USA, September 2022. Association for Machine Translation in the Americas.
  3. AngIE. Angle-optimized text embeddings. https://github.com/SeanLee97/AnglE, 2023.
  4. Gar-meets-rag paradigm for zero-shot information retrieval. arXiv preprint arXiv:2310.20158, 2023.
  5. Retrieval-based language models and applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts), pages 41–46, 2023.
  6. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511, 2023.
  7. BAAI. Flagembedding. https://github.com/FlagOpen/FlagEmbedding, 2023.
  8. Knowledge-augmented language model verification. arXiv preprint arXiv:2310.12836, 2023.
  9. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.
  10. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
  11. Optimizing retrieval-augmented reader models via token elimination. arXiv preprint arXiv:2310.13682, 2023.
  12. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439, 2020.
  13. Vladimir Blagojevi. Enhancing rag pipelines in haystack: Introducing diversityranker and lostinthemiddleranker. https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5, 2023.
  14. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR, 2022.
  15. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  16. Neural machine translation with monolingual translation memory. arXiv preprint arXiv:2105.11269, 2021.
  17. Using external off-policy speech-to-text mappings in contextual end-to-end automated speech recognition. arXiv preprint arXiv:2301.02736, 2023.
  18. Open-domain question answering. In Proceedings of the 58th annual meeting of the association for computational linguistics: tutorial abstracts, pages 34–37, 2020.
  19. Walking down the memory maze: Beyond context limit through interactive reading. arXiv preprint arXiv:2310.05029, 2023.
  20. Benchmarking large language models in retrieval-augmented generation. arXiv preprint arXiv:2309.01431, 2023.
  21. Neural machine translation with contrastive translation memories. arXiv preprint arXiv:2212.03140, 2022.
  22. Uprise: Universal prompt retrieval for improving zero-shot evaluation. arXiv preprint arXiv:2303.08518, 2023.
  23. Lift yourself up: Retrieval-augmented text generation with self memory. arXiv preprint arXiv:2305.02437, 2023.
  24. Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019.
  25. Cohere. Say goodbye to irrelevant search results: Cohere rerank is here. https://txt.cohere.com/rerank/, 2023.
  26. Promptagator: Few-shot dense retrieval from 8 examples. arXiv preprint arXiv:2209.11755, 2022.
  27. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023.
  28. Retrieval-generation synergy augmented large language models. arXiv preprint arXiv:2310.05149, 2023.
  29. Trends in integration of knowledge and large language models: A survey and taxonomy of methods, benchmarks, and applications. arXiv preprint arXiv:2311.05876, 2023.
  30. Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496, 2022.
  31. Robust retrieval augmented generation for zero-shot slot filling. arXiv preprint arXiv:2108.13934, 2021.
  32. Google. Gemini: A family of highly capable multimodal models. https://goo.gle/GeminiPaper, 2023.
  33. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  34. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022.
  35. A survey of techniques for maximizing llm performance. https://community.openai.com/t/openai-dev-day-2023-breakout-sessions/505213#a-survey-of-techniques-for-maximizing-llm-performance-2, 2023.
  36. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736, 2023.
  37. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023.
  38. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR, 2023.
  39. Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv preprint arXiv:2305.18846, 2023.
  40. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020.
  41. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172, 2019.
  42. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48, 2020.
  43. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024, 2022.
  44. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466, 2019.
  45. Learning dense representations of phrases at scale. arXiv preprint arXiv:2012.12624, 2020.
  46. Best practices for llm evaluation of rag applications. https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG, 2023.
  47. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  48. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  49. From classification to generation: Insights into crosslingual retrieval augmented icl. arXiv preprint arXiv:2311.06595, 2023.
  50. Chain of knowledge: A framework for grounding large language models with structured knowledge bases. arXiv preprint arXiv:2305.13269, 2023.
  51. Structure-aware language model pretraining improves dense retrieval on structured data. arXiv preprint arXiv:2305.19912, 2023.
  52. Ra-dit: Retrieval-augmented dual instruction tuning. arXiv preprint arXiv:2310.01352, 2023.
  53. Scatter: selective context attentional scene text recognizer. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11962–11972, 2020.
  54. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
  55. Jerry Liu. Building production-ready rag applications. https://www.ai.engineer/summit/schedule/building-production-ready-rag-applications, 2023.
  56. Augmented large language models with parametric knowledge guiding. arXiv preprint arXiv:2305.04757, 2023.
  57. Query rewriting for retrieval-augmented large language models. arXiv preprint arXiv:2305.14283, 2023.
  58. Large language model is not a good few-shot information extractor, but a good reranker for hard samples! ArXiv, abs/2303.08559, 2023.
  59. Ret-llm: Towards a general read-write memory for large language models. arXiv preprint arXiv:2305.14322, 2023.
  60. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  61. Retrieval-based prompt selection for code-related few-shot learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2450–2462, 2023.
  62. OpenAI. Gpt-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf, 2023.
  63. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
  64. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  65. Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732, 2015.
  66. Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266, 2019.
  67. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389, 2009.
  68. Ares: An automated evaluation framework for retrieval-augmented generation systems. arXiv preprint arXiv:2311.09476, 2023.
  69. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  70. Simple entity-centric questions challenge dense retrievers. arXiv preprint arXiv:2109.08535, 2021.
  71. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv preprint arXiv:2305.15294, 2023.
  72. Replug: Retrieval-augmented black-box language models. arXiv preprint arXiv:2301.12652, 2023.
  73. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567, 2021.
  74. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  75. Recitation-augmented language models. arXiv preprint arXiv:2210.01296, 2022.
  76. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  77. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509, 2022.
  78. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  79. Open-set recognition: A good closed-set classifier is all you need? arXiv preprint arXiv:2110.06207, 2021.
  80. VoyageAI. Voyage’s embedding models. https://docs.voyageai.com/embeddings/, 2023.
  81. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32, 2019.
  82. Training data is more valuable than you think: A simple and effective method by retrieving from training data. arXiv preprint arXiv:2203.08773, 2022.
  83. Training data is more valuable than you think: A simple and effective method by retrieving from training data. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3170–3179, Dublin, Ireland, May 2022. Association for Computational Linguistics.
  84. Shall we pretrain autoregressive language models with retrieval? a comprehensive study. arXiv preprint arXiv:2304.06762, 2023.
  85. Query2doc: Query expansion with large language models. arXiv preprint arXiv:2303.07678, 2023.
  86. Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases. arXiv preprint arXiv:2308.11761, 2023.
  87. Self-knowledge guided retrieval augmentation for large language models. arXiv preprint arXiv:2310.05002, 2023.
  88. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  89. Graph based translation memory for neural machine translation. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7297–7304, 2019.
  90. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408, 2023.
  91. Retrieval meets long context large language models. arXiv preprint arXiv:2310.03025, 2023.
  92. Vid2seq: Large-scale pretraining of a visual language model for dense video captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10714–10726, 2023.
  93. Prca: Fitting black-box large language models for retrieval question answering via pluggable reward-driven contextual adapter. arXiv preprint arXiv:2310.18347, 2023.
  94. Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
  95. Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469, 2023.
  96. Retrieval-augmented multimodal language modeling. arXiv preprint arXiv:2211.12561, 2022.
  97. Coreferential reasoning learning for language representation. arXiv preprint arXiv:2004.06870, 2020.
  98. Making retrieval-augmented language models robust to irrelevant context. arXiv preprint arXiv:2310.01558, 2023.
  99. Generate rather than retrieve: Large language models are strong context generators. arXiv preprint arXiv:2209.10063, 2022.
  100. Chain-of-note: Enhancing robustness in retrieval-augmented language models. arXiv preprint arXiv:2311.09210, 2023.
  101. Augmentation-adapted retriever improves generalization of language models as generic plug-in. arXiv preprint arXiv:2305.17331, 2023.
  102. Ernie: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129, 2019.
  103. Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554, 2023.
  104. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023.
  105. Jiawei Zhang. Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv preprint arXiv:2304.11116, 2023.
  106. Generating synthetic speech from spokenvocab for speech translation. arXiv preprint arXiv:2210.08174, 2022.
  107. Take a step back: Evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117, 2023.
  108. Training language models with memory augmentation. arXiv preprint arXiv:2205.12674, 2022.
  109. Visualize before you write: Imagination-guided open-ended text generation. arXiv preprint arXiv:2210.03765, 2022.
  110. Large language models for information retrieval: A survey. arXiv preprint arXiv:2308.07107, 2023.
  111. Open-source large language models are strong zero-shot query likelihood models for document ranking. arXiv preprint arXiv:2310.13243, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yunfan Gao (13 papers)
  2. Yun Xiong (41 papers)
  3. Xinyu Gao (58 papers)
  4. Kangxiang Jia (1 paper)
  5. Jinliu Pan (1 paper)
  6. Yuxi Bi (3 papers)
  7. Yi Dai (20 papers)
  8. Jiawei Sun (34 papers)
  9. Haofen Wang (32 papers)
  10. Meng Wang (1063 papers)
Citations (920)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com