Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models (2305.16243v3)

Published 25 May 2023 in cs.CL

Abstract: Augmenting LLMs with a retrieval mechanism has been shown to significantly improve their performance while keeping the number of parameters low. Retrieval-augmented models commonly rely on a semantic retrieval mechanism based on the similarity between dense representations of the query chunk and potential neighbors. In this paper, we study the state-of-the-art Retro model and observe that its performance gain is better explained by surface-level similarities, such as token overlap. Inspired by this, we replace the semantic retrieval in Retro with a surface-level method based on BM25, obtaining a significant reduction in perplexity. As full BM25 retrieval can be computationally costly for large datasets, we also apply it in a re-ranking scenario, gaining part of the perplexity reduction with minimal computational overhead.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (31)

Authors (4)

Ehsan Doostmohammadi (11 papers)
Tobias Norlund (6 papers)
Marco Kuhlmann (13 papers)
Richard Johansson (18 papers)

Citations (7)

View on Semantic Scholar

Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models (2305.16243v3)

Related Papers