Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models (2404.03514v2)

Published 4 Apr 2024 in cs.CL and cs.AI

Abstract: Retrieval-augmented LLMs have been remarkably competent in various NLP tasks. However, it was observed by previous works that retrieval is not always helpful, especially when the LLM is already knowledgeable on the query to answer. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM. Previous works of ARAG either require accessing the pre-training corpus or prompting with additional model inferences. Aiming to avoid such drawbacks, we propose to determine whether the model is knowledgeable on a query via inspecting the (contextualized) pre-trained token embeddings of LLMs. We hypothesize that such embeddings capture rich information on the model's intrinsic knowledge base, which enables an efficient way of judging the necessity to retrieve from an external corpus. Extensive experiments demonstrate our ARAG approach's superior performance across various benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Gpt-neox-20b: An open-source autoregressive language model. CoRR, abs/2204.06745.
  2. Isotropy in the contextual embedding space: Clusters and manifolds. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
  3. Representation degeneration problem in training natural language generation models. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  4. Sem4sap: Synonymous expression mining from open knowledge graph for language model synonym-aware pretraining. CoRR, abs/2303.14425.
  5. Foundation models for recommender systems: A survey and new perspectives. arXiv preprint arXiv:2402.11143.
  6. Are large pre-trained language models leaking your personal information? In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 2038–2047. Association for Computational Linguistics.
  7. Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res., 2022.
  8. On the sentence embeddings from pre-trained language models. CoRR, abs/2011.05864.
  9. A survey on retrieval-augmented text generation. CoRR, abs/2202.01110.
  10. Unsupervised cross-task generalization via retrieval augmentation. In NeurIPS.
  11. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 9802–9822. Association for Computational Linguistics.
  12. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of CEUR Workshop Proceedings. CEUR-WS.org.
  13. Stephen E. Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
  14. Quantifying association capabilities of large language models and its implications on privacy leakage. CoRR, abs/2305.12707.
  15. Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  16. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  17. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  18. Retrieval-augmented multimodal language modeling. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 39755–39769. PMLR.
  19. Generate rather than retrieve: Large language models are strong context generators. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  20. GLM-130B: an open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chengkai Huang (13 papers)
  2. Rui Wang (996 papers)
  3. Kaige Xie (11 papers)
  4. Tong Yu (119 papers)
  5. Lina Yao (194 papers)
  6. Yu Xia (65 papers)
  7. Julian McAuley (238 papers)
Citations (3)