Multi-View Document Representation Learning for Open-Domain Dense Retrieval (2203.08372v1)

Published 16 Mar 2022 in cs.CL and cs.IR

Abstract: Dense retrieval has achieved impressive advances in first-stage retrieval from a large-scale document collection, which is built on bi-encoder architecture to produce single vector representation of query and document. However, a document can usually answer multiple potential queries from different views. So the single vector representation of a document is hard to match with multi-view queries, and faces a semantic mismatch problem. This paper proposes a multi-view document representation learning framework, aiming to produce multi-view embeddings to represent documents and enforce them to align with different queries. First, we propose a simple yet effective method of generating multiple embeddings through viewers. Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries. Experiments show our method outperforms recent works and achieves state-of-the-art results.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Shunyu Zhang (8 papers)
Yaobo Liang (29 papers)
Ming Gong (246 papers)
Daxin Jiang (138 papers)
Nan Duan (172 papers)

Citations (56)

View on Semantic Scholar

Multi-View Document Representation Learning for Open-Domain Dense Retrieval (2203.08372v1)

Related Papers