Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning (1912.13080v1)

Published 30 Dec 2019 in cs.IR, cs.CL, and cs.LG

Abstract: While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual LLMs to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sean MacAvaney (75 papers)
  2. Luca Soldaini (62 papers)
  3. Nazli Goharian (43 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.