Larger-Scale Transformers for Multilingual Masked Language Modeling (2105.00572v1)

Published 2 May 2021 in cs.CL

Abstract: Recent work has demonstrated the effectiveness of cross-lingual LLM pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked LLMs, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (5)

Naman Goyal (37 papers)
Jingfei Du (16 papers)
Myle Ott (33 papers)
Giri Anantharaman (2 papers)
Alexis Conneau (33 papers)

Citations (112)

View on Semantic Scholar

Larger-Scale Transformers for Multilingual Masked Language Modeling (2105.00572v1)

Related Papers