Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Continual Learning for Generative Retrieval over Dynamic Corpora (2308.14968v1)

Published 29 Aug 2023 in cs.IR

Abstract: Generative retrieval (GR) directly predicts the identifiers of relevant documents (i.e., docids) based on a parametric model. It has achieved solid performance on many ad-hoc retrieval tasks. So far, these tasks have assumed a static document collection. In many practical scenarios, however, document collections are dynamic, where new documents are continuously added to the corpus. The ability to incrementally index new documents while preserving the ability to answer queries with both previously and newly indexed relevant documents is vital to applying GR models. In this paper, we address this practical continual learning problem for GR. We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents. Empirical results demonstrate the effectiveness and efficiency of the proposed model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiangui Chen (8 papers)
  2. Ruqing Zhang (60 papers)
  3. Jiafeng Guo (161 papers)
  4. Maarten de Rijke (263 papers)
  5. Wei Chen (1290 papers)
  6. Yixing Fan (55 papers)
  7. Xueqi Cheng (274 papers)
Citations (28)

Summary

We haven't generated a summary for this paper yet.