You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models (2210.07135v2)

Published 13 Oct 2022 in cs.CL

Abstract: Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacher-student knowledge distillation. In this setting, we utilize monolingual teacher models optimized for their language. We use those teachers along with balanced (sub-sampled) data to distill the teachers' knowledge into a single multilingual student. Our method outperforms standard training methods in low-resource languages and retrains performance on high-resource languages while using the same amount of data. If applied widely, our approach can increase the representation of low-resource languages in NLP systems.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (3)

Tomasz Limisiewicz (18 papers)
Dan Malkin (5 papers)
Gabriel Stanovsky (61 papers)

Citations (3)

View on Semantic Scholar

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models (2210.07135v2)

Related Papers