Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Approaches to Multilingual Information Retrieval (2209.01335v2)

Published 3 Sep 2022 in cs.IR and cs.CL

Abstract: Providing access to information across languages has been a goal of Information Retrieval (IR) for decades. While progress has been made on Cross Language IR (CLIR) where queries are expressed in one language and documents in another, the multilingual (MLIR) task to create a single ranked list of documents across many languages is considerably more challenging. This paper investigates whether advances in neural document translation and pretrained multilingual neural LLMs enable improvements in the state of the art over earlier MLIR techniques. The results show that although combining neural document translation with neural ranking yields the best Mean Average Precision (MAP), 98% of that MAP score can be achieved with an 84% reduction in indexing time by using a pretrained XLM-R multilingual LLM to index documents in their native language, and that 2% difference in effectiveness is not statistically significant. Key to achieving these results for MLIR is to fine-tune XLM-R using mixed-language batches from neural translations of MS MARCO passages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Dawn Lawrie (31 papers)
  2. Eugene Yang (38 papers)
  3. Douglas W. Oard (18 papers)
  4. James Mayfield (21 papers)
Citations (14)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets