Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lifting the Curse of Multilinguality by Pre-training Modular Transformers (2205.06266v1)

Published 12 May 2022 in cs.CL

Abstract: Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learns language-specific components post-hoc, we pre-train the modules of our Cross-lingual Modular (X-Mod) models from the start. Our experiments on natural language inference, named entity recognition and question answering show that our approach not only mitigates the negative interference between languages, but also enables positive transfer, resulting in improved monolingual and cross-lingual performance. Furthermore, our approach enables adding languages post-hoc with no measurable drop in performance, no longer limiting the model usage to the set of pre-trained languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jonas Pfeiffer (34 papers)
  2. Naman Goyal (37 papers)
  3. Xi Victoria Lin (39 papers)
  4. Xian Li (116 papers)
  5. James Cross (22 papers)
  6. Sebastian Riedel (140 papers)
  7. Mikel Artetxe (52 papers)
Citations (131)

Summary

We haven't generated a summary for this paper yet.