Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations (1906.03492v1)

Published 8 Jun 2019 in cs.CL

Abstract: In this paper, we propose to boost low-resource cross-lingual document retrieval performance with deep bilingual query-document representations. We match queries and documents in both source and target languages with four components, each of which is implemented as a term interaction-based deep neural network with cross-lingual word embeddings as input. By including query likelihood scores as extra features, our model effectively learns to rerank the retrieved documents by using a small number of relevance labels for low-resource language pairs. Due to the shared cross-lingual word embedding space, the model can also be directly applied to another language pair without any training label. Experimental results on the MATERIAL dataset show that our model outperforms the competitive translation-based baselines on English-Swahili, English-Tagalog, and English-Somali cross-lingual information retrieval tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Rui Zhang (1138 papers)
  2. Caitlin Westerfield (2 papers)
  3. Sungrok Shim (4 papers)
  4. Garrett Bingham (10 papers)
  5. Alexander Fabbri (11 papers)
  6. Neha Verma (18 papers)
  7. William Hu (4 papers)
  8. Dragomir Radev (98 papers)
Citations (19)