Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too (2005.13013v2)

Published 26 May 2020 in cs.CL

Abstract: Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings. We investigate whether English intermediate-task training is still helpful on non-English target tasks. Using nine intermediate language-understanding tasks, we evaluate intermediate-task transfer in a zero-shot cross-lingual setting on the XTREME benchmark. We see large improvements from intermediate training on the BUCC and Tatoeba sentence retrieval tasks and moderate improvements on question-answering target tasks. MNLI, SQuAD and HellaSwag achieve the best overall results as intermediate tasks, while multi-task intermediate offers small additional improvements. Using our best intermediate-task models for each target task, we obtain a 5.4 point improvement over XLM-R Large on the XTREME benchmark, setting the state of the art as of June 2020. We also investigate continuing multilingual MLM during intermediate-task training and using machine-translated intermediate-task data, but neither consistently outperforms simply performing English intermediate-task training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jason Phang (40 papers)
  2. Iacer Calixto (25 papers)
  3. Phu Mon Htut (18 papers)
  4. Yada Pruksachatkun (12 papers)
  5. Haokun Liu (26 papers)
  6. Clara Vania (16 papers)
  7. Katharina Kann (50 papers)
  8. Samuel R. Bowman (103 papers)
Citations (65)