The VolcTrans System for WMT22 Multilingual Machine Translation Task (2210.11599v1)

Published 20 Oct 2022 in cs.CL

Abstract: This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformerbased multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. A series of heuristic rules clean both bilingual and monolingual texts. On the official test set, our system achieves 17.3 BLEU, 21.9 spBLEU, and 41.9 chrF2++ on average over all language pairs. The average inference speed is 11.5 sentences per second using a single Nvidia Tesla V100 GPU. Our code and trained models are available at https://github.com/xian8/wmt22

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (7)

Xian Qian (4 papers)
Kai Hu (55 papers)
Jiaqiang Wang (3 papers)
Yifeng Liu (36 papers)
Xingyuan Pan (9 papers)
Jun Cao (108 papers)
Mingxuan Wang (83 papers)

Citations (1)

View on Semantic Scholar

The VolcTrans System for WMT22 Multilingual Machine Translation Task (2210.11599v1)

Related Papers