Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Lingual Training with Dense Retrieval for Document Retrieval (2109.01628v1)

Published 3 Sep 2021 in cs.CL and cs.IR

Abstract: Dense retrieval has shown great success in passage ranking in English. However, its effectiveness in document retrieval for non-English languages remains unexplored due to the limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to multiple non-English languages. Our experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families reveal that zero-shot model-based transfer using mBERT improves the search quality in non-English mono-lingual retrieval. Also, we find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer that requires external translators and query generators.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Peng Shi (80 papers)
  2. Rui Zhang (1138 papers)
  3. He Bai (50 papers)
  4. Jimmy Lin (208 papers)
Citations (6)