Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model (1810.12836v4)

Published 30 Oct 2018 in cs.CL

Abstract: A significant roadblock in multilingual neural LLMing is the lack of labeled non-English data. One potential method for overcoming this issue is learning cross-lingual text representations that can be used to transfer the performance from training on English tasks to non-English tasks, despite little to no task-specific non-English data. In this paper, we explore a natural setup for learning cross-lingual sentence representations: the dual-encoder. We provide a comprehensive evaluation of our cross-lingual representations on a number of monolingual, cross-lingual, and zero-shot/few-shot learning tasks, and also give an analysis of different learned cross-lingual embedding spaces.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Muthuraman Chidambaram (2 papers)
  2. Yinfei Yang (73 papers)
  3. Daniel Cer (28 papers)
  4. Steve Yuan (5 papers)
  5. Yun-Hsuan Sung (18 papers)
  6. Brian Strope (11 papers)
  7. Ray Kurzweil (11 papers)
Citations (124)