SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval (2209.05917v3)

Published 13 Sep 2022 in cs.IR

Abstract: Sparse document representations have been widely used to retrieve relevant documents via exact lexical matching. Owing to the pre-computed inverted index, it supports fast ad-hoc search but incurs the vocabulary mismatch problem. Although recent neural ranking models using pre-trained LLMs can address this problem, they usually require expensive query inference costs, implying the trade-off between effectiveness and efficiency. Tackling the trade-off, we propose a novel uni-encoder ranking model, Sparse retriever using a Dual document Encoder (SpaDE), learning document representation via the dual encoder. Each encoder plays a central role in (i) adjusting the importance of terms to improve lexical matching and (ii) expanding additional terms to support semantic matching. Furthermore, our co-training strategy trains the dual encoder effectively and avoids unnecessary intervention in training each other. Experimental results on several benchmarks show that SpaDE outperforms existing uni-encoder ranking models.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (77)

Authors (6)

Eunseong Choi (8 papers)
Sunkyung Lee (9 papers)
Minjin Choi (22 papers)
Hyeseon Ko (1 paper)
Young-In Song (2 papers)
Jongwuk Lee (24 papers)

Citations (15)

View on Semantic Scholar

SpaDE: Improving Sparse Representations using a Dual Document Encoder for First-stage Retrieval (2209.05917v3)

Related Papers