Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Auto Search Indexer for End-to-End Document Retrieval (2310.12455v2)

Published 19 Oct 2023 in cs.IR

Abstract: Generative retrieval, which is a new advanced paradigm for document retrieval, has recently attracted research interests, since it encodes all documents into the model and directly generates the retrieved documents. However, its power is still underutilized since it heavily relies on the "preprocessed" document identifiers (docids), thus limiting its retrieval performance and ability to retrieve new documents. In this paper, we propose a novel fully end-to-end retrieval paradigm. It can not only end-to-end learn the best docids for existing and new documents automatically via a semantic indexing module, but also perform end-to-end document retrieval via an encoder-decoder-based generative model, namely Auto Search Indexer (ASI). Besides, we design a reparameterization mechanism to combine the above two modules into a joint optimization framework. Extensive experimental results demonstrate the superiority of our model over advanced baselines on both public and industrial datasets and also verify the ability to deal with new documents.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Tianchi Yang (15 papers)
  2. Minghui Song (18 papers)
  3. Zihan Zhang (120 papers)
  4. Haizhen Huang (18 papers)
  5. Weiwei Deng (29 papers)
  6. Feng Sun (34 papers)
  7. Qi Zhang (784 papers)
Citations (10)
X Twitter Logo Streamline Icon: https://streamlinehq.com