Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval (2108.06027v2)

Published 13 Aug 2021 in cs.IR, cs.AI, and cs.CL

Abstract: Recently, dense passage retrieval has become a mainstream approach to finding relevant information in various natural language processing tasks. A number of studies have been devoted to improving the widely adopted dual-encoder architecture. However, most of the previous studies only consider query-centric similarity relation when learning the dual-encoder retriever. In order to capture more comprehensive similarity relations, we propose a novel approach that leverages both query-centric and PAssage-centric sImilarity Relations (called PAIR) for dense passage retrieval. To implement our approach, we make three major technical contributions by introducing formal formulations of the two kinds of similarity relations, generating high-quality pseudo labeled data via knowledge distillation, and designing an effective two-stage training procedure that incorporates passage-centric similarity relation constraint. Extensive experiments show that our approach significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Ruiyang Ren (18 papers)
  2. Shangwen Lv (5 papers)
  3. Yingqi Qu (11 papers)
  4. Jing Liu (526 papers)
  5. Wayne Xin Zhao (196 papers)
  6. Hua Wu (191 papers)
  7. Haifeng Wang (194 papers)
  8. Ji-Rong Wen (299 papers)
  9. Qiaoqiao She (9 papers)
Citations (85)