Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders (2305.14499v2)

Published 23 May 2023 in cs.CL and cs.IR

Abstract: Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-of-the art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with LLMs) as a model architecture that is compatible with recent encoder-decoder and decoder-only LLMs, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Livio Baldini Soares (18 papers)
  2. Daniel Gillick (11 papers)
  3. Jeremy R. Cole (10 papers)
  4. Tom Kwiatkowski (21 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.