Cross-Lingual Training with Dense Retrieval for Document Retrieval (2109.01628v1)

Published 3 Sep 2021 in cs.CL and cs.IR

Abstract: Dense retrieval has shown great success in passage ranking in English. However, its effectiveness in document retrieval for non-English languages remains unexplored due to the limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to multiple non-English languages. Our experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families reveal that zero-shot model-based transfer using mBERT improves the search quality in non-English mono-lingual retrieval. Also, we find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer that requires external translators and query generators.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Peng Shi (80 papers)
Rui Zhang (1138 papers)
He Bai (50 papers)
Jimmy Lin (208 papers)

Citations (6)

View on Semantic Scholar

Cross-Lingual Training with Dense Retrieval for Document Retrieval (2109.01628v1)

Related Papers