Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TWOLAR: a TWO-step LLM-Augmented distillation method for passage Reranking (2403.17759v1)

Published 26 Mar 2024 in cs.IR

Abstract: In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from LLMs (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Davide Baldelli (1 paper)
  2. Junfeng Jiang (11 papers)
  3. Akiko Aizawa (74 papers)
  4. Paolo Torroni (17 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com