Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lego-MT: Learning Detachable Models for Massively Multilingual Machine Translation (2212.10551v3)

Published 20 Dec 2022 in cs.CL and cs.AI

Abstract: Multilingual neural machine translation (MNMT) aims to build a unified model for many language directions. Existing monolithic models for MNMT encounter two challenges: parameter interference among languages and inefficient inference for large models. In this paper, we revisit the classic multi-way structures and develop a detachable model by assigning each language (or group of languages) to an individual branch that supports plug-and-play training and inference. To address the needs of learning representations for all languages in a unified space, we propose a novel efficient training recipe, upon which we build an effective detachable model, Lego-MT. For a fair comparison, we collect data from OPUS and build a translation benchmark covering 433 languages and 1.3B parallel data. Experiments show that Lego-MT with 1.2B parameters brings an average gain of 3.2 spBLEU. It even outperforms M2M-100 with 12B parameters. The proposed training recipe brings a 28.2$\times$ speedup over the conventional multi-way training method.\footnote{ \url{https://github.com/CONE-MT/Lego-MT}.}

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Fei Yuan (28 papers)
  2. Yinquan Lu (3 papers)
  3. Lingpeng Kong (134 papers)
  4. Lei Li (1293 papers)
  5. Yu Qiao (563 papers)
  6. Jingjing Xu (80 papers)
  7. Wenhao Zhu (32 papers)
Citations (19)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub