Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation (2310.09917v3)

Published 15 Oct 2023 in cs.CL

Abstract: Zero-shot cross-lingual knowledge transfer enables the multilingual pretrained LLM (mPLM), finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, considering full finetuning and parameter-efficient finetuning with adapters. We find that mBART with adapters performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. We also underline the importance of tuning learning rate used for finetuning, which helps to alleviate the problem of generation in the wrong language.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Nadezhda Chirkova (25 papers)
  2. Sheng Liang (11 papers)
  3. Vassilina Nikoulina (28 papers)