Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Lingual Phrase Retrieval (2204.08887v1)

Published 19 Apr 2022 in cs.CL

Abstract: Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during training. Our dataset, code, and trained models are publicly available at www.github.com/cwszz/XPR/.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Heqi Zheng (3 papers)
  2. Xiao Zhang (435 papers)
  3. Zewen Chi (29 papers)
  4. Heyan Huang (107 papers)
  5. Tan Yan (6 papers)
  6. Tian Lan (162 papers)
  7. Wei Wei (425 papers)
  8. Xian-Ling Mao (76 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com