Soft Language Clustering for Multilingual Model Pre-training (2306.07610v1)

Published 13 Jun 2023 in cs.CL

Abstract: Multilingual pre-trained LLMs have demonstrated impressive (zero-shot) cross-lingual transfer abilities, however, their performance is hindered when the target language has distant typology from source languages or when pre-training data is limited in size. In this paper, we propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods. On the tasks of XTREME including text classification, sequence labeling, question answering, and sentence retrieval, both base- and large-size LLMs pre-trained with our proposed method exhibit consistent performance improvement. Furthermore, it provides substantial advantages for low-resource languages in unsupervised sentence retrieval and for target languages that differ greatly from the source language in cross-lingual transfer.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (8)

Jiali Zeng (24 papers)
Yufan Jiang (17 papers)
Yongjing Yin (19 papers)
Yi Jing (9 papers)
Fandong Meng (174 papers)
Binghuai Lin (20 papers)
Yunbo Cao (43 papers)
Jie Zhou (687 papers)

Citations (4)

View on Semantic Scholar

Soft Language Clustering for Multilingual Model Pre-training (2306.07610v1)

Related Papers