Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking (2105.09816v1)

Published 20 May 2021 in cs.IR and cs.CL

Abstract: An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained LLMs - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency due to the cost of evaluating every passage in the document with BERT. To make matters worse, this high inference cost and latency varies based on the length of the document, with longer documents requiring more time and computation. To address this challenge, we adopt an intra-document cascading strategy, which prunes passages of a candidate document using a less expensive model, called ESM, before running a scoring model that is more expensive and effective, called ETM. We found it best to train ESM (short for Efficient Student Model) via knowledge distillation from the ETM (short for Effective Teacher Model) e.g., BERT. This pruning allows us to only run the ETM model on a smaller set of passages whose size does not vary by document length. Our experiments on the MS MARCO and TREC Deep Learning Track benchmarks suggest that the proposed Intra-Document Cascaded Ranking Model (IDCM) leads to over 400% lower query latency by providing essentially the same effectiveness as the state-of-the-art BERT-based document ranking models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sebastian Hofstätter (31 papers)
  2. Bhaskar Mitra (78 papers)
  3. Hamed Zamani (88 papers)
  4. Nick Craswell (51 papers)
  5. Allan Hanbury (45 papers)
Citations (39)