Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

In-Context Retrieval-Augmented Language Models (2302.00083v3)

Published 31 Jan 2023 in cs.CL and cs.IR
In-Context Retrieval-Augmented Language Models

Abstract: Retrieval-Augmented LLMing (RALM) methods, which condition a LLM (LM) on relevant documents from a grounding corpus during generation, were shown to significantly improve LLMing performance. In addition, they can mitigate the problem of factually inaccurate text generation and provide natural source attribution mechanism. Existing RALM approaches focus on modifying the LM architecture in order to facilitate the incorporation of external information, significantly complicating deployment. This paper considers a simple alternative, which we dub In-Context RALM: leaving the LM architecture unchanged and prepending grounding documents to the input, without any further training of the LM. We show that In-Context RALM that builds on off-the-shelf general purpose retrievers provides surprisingly large LM gains across model sizes and diverse corpora. We also demonstrate that the document retrieval and ranking mechanism can be specialized to the RALM setting to further boost performance. We conclude that In-Context RALM has considerable potential to increase the prevalence of LM grounding, particularly in settings where a pretrained LM must be used without modification or even via API access.

In-Context Retrieval-Augmented LLMs

The paper "In-Context Retrieval-Augmented LLMs" presents a novel approach to enhancing LLM (LM) performance through a technique that requires minimal alteration to existing models. The authors investigate an alternative method, termed In-Context RALM, which allows pre-trained LMs to benefit from external information without necessitating changes to their architecture or additional training. This approach involves simply appending retrieved documents to the input sequence of the LM.

Traditionally, Retrieval-Augmented LLMing (RALM) necessitates architectural changes to incorporate external data, often complicating deployment and reducing flexibility. Such transformations have typically required dense retrievers and additional training phases, which are both time-consuming and computationally costly. In contrast, In-Context RALM leverages off-the-shelf general-purpose retrievers, particularly favoring BM25, a sparse retrieval method. This strategy yields significant improvements in model performance, analogous to increasing model sizes by 2-3 times, without any further parameter adjustment or fine-tuning.

The paper measures the success of the In-Context RALM across several datasets: WikiText-103, RealNews, and subsets of The Pile such as ArXiv, Stack Exchange, and FreeLaw. These datasets provide diverse testing grounds for validating the proposed method. The demonstrated gains suggest that when grounding documents are wisely chosen, even a substantial LM can achieve performance levels akin to much larger counterparts. For example, using an off-the-shelf retriever improved the performance of a 6.7B parameter model to match that of a 66B parameter model for the OPT model family.

The authors further explore the customization of document retrieval mechanisms. Beyond using BM25's lexical retrieval out-of-the-box, they delve into reranking strategies to better align selected documents with LM tasks. A zero-shot ranking technique using smaller LMs was employed before training a dedicated bidirectional reranker using self-supervision derived from the LM's own signals. This approach demonstrates additional perceptible improvements, proving effective without additional changes to the existing LMs.

This paper's contributions are practical, offering methodological insights for enhancing LM outputs' factual accuracy and reliability, a salient concern for applications where LMs generate information-sensitive documents. The strategy facilitates retrieval augmentation in scenarios where models are accessible only through an API, broadening the applicability to diverse use cases.

In the context of open-domain question answering, the authors confirm that the advantages of In-Context RALM extend beyond pure LLMing tasks. By using retrieval-augmented LMs without specialized training for the ODQA setting, they show that it is feasible to significantly boost zero-shot performance for varied model sizes.

The paper's findings hold implications for future RALM developments, hinting that significant LM improvements can be achieved through careful retrieval design rather than more extensive LM architecture tailoring or retraining. This line of research suggests promising directions for future AI systems, where increasing model size is not the only path to better performance—but rather, informed interaction with curated, external information sources might yield equal, if not superior, results in efficiency and accuracy. As the availability and complexity of information grow, methods for seamlessly integrating this information into the decision-making processes of LMs will become increasingly vital. The simplicity, coupled with the efficacy of In-Context RALM, is a step toward more accessible and deployable AI systems that maintain competitive performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ori Ram (14 papers)
  2. Yoav Levine (24 papers)
  3. Itay Dalmedigos (5 papers)
  4. Dor Muhlgay (6 papers)
  5. Amnon Shashua (44 papers)
  6. Kevin Leyton-Brown (57 papers)
  7. Yoav Shoham (22 papers)
Citations (409)
Youtube Logo Streamline Icon: https://streamlinehq.com