Crosslingual Generalization through Multitask Finetuning (2211.01786v2)

Published 3 Nov 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Multitask prompted finetuning (MTF) has been shown to help LLMs generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused on English data and models. We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0. We find finetuning large multilingual LLMs on English tasks with English prompts allows for task generalization to non-English languages that appear only in the pretraining corpus. Finetuning on multilingual tasks with English prompts further improves performance on English and non-English tasks leading to various state-of-the-art zero-shot results. We also investigate finetuning on multilingual tasks with prompts that have been machine-translated from English to match the language of each dataset. We find training on these machine-translated prompts leads to better performance on human-written prompts in the respective languages. Surprisingly, we find models are capable of zero-shot generalization to tasks in languages they have never intentionally seen. We conjecture that the models are learning higher-level capabilities that are both task- and language-agnostic. In addition, we introduce xP3, a composite of supervised datasets in 46 languages with English and machine-translated prompts. Our code, datasets and models are freely available at https://github.com/bigscience-workshop/xmtf.

PDF Abstract

Crosslingual Generalization through Multitask Finetuning

The paper "Crosslingual Generalization through Multitask Finetuning" explores the potential of large multilingual LLMs to generalize across tasks and languages through Multitask Prompted Finetuning (MTF). Focusing on pretrained multilingual BLOOM and mT5 model families, the researchers developed finetuned variants known as BLOOMZ and mT0.

Key Findings and Methodology

Multitask finetuning involves training a model on multiple tasks using natural language prompts. The paper asserts that this approach, previously applied effectively in English contexts, can significantly benefit multilingual models. The researchers focus on evaluating the models' zero-shot performance—the ability to generalize tasks without explicit task-specific data.

BLOOMZ and mT0 Models: The paper presents BLOOMZ and mT0, produced by finetuning the BLOOM and mT5 models respectively. Finetuning on English tasks helped the models perform better on non-English tasks, suggesting inherent crosslingual capabilities.
Multilingual vs. English-only Finetuning: Finetuning on multilingual tasks showed enhanced performance in both English and non-English tasks, achieving several state-of-the-art zero-shot results.
Translated Prompts: The research includes experiments with machine-translated prompts. The models demonstrated improved performance on human-written prompts in different languages, indicating the benefits of exposure to multilingual data during finetuning.
Generalization to Unseen Languages: The models exhibited zero-shot generalization to tasks in languages that were minimally present in pretraining dataset. This suggests models are learning capabilities that are both task- and language-agnostic.
xP3 Dataset: A composite dataset of 46 languages with English and machine-translated prompts was introduced. This highlights an important resource hub for future research in crosslingual MTF.

Implications and Future Directions

The findings of this paper reinforce the potential of multilingual models to operate in diverse language setups by leveraging multitask finetuning. The paper suggests that even models largely trained on English can extend their utility to a broader set of languages through strategically designed finetuning processes.

Practical Implications: This research paves the way for deploying AI systems across regions with low-resource languages without the need for extensive language-specific training data.

Theoretical Implications: The intrinsic ability of models to generalize across languages points to possible underlying universal language representations, which can be a rich ground for further theoretical exploration.

Future Directions: Exploring these models' robustness, scaling up datasets, and addressing the shortcoming in code generation tasks provides ample opportunities for advancing multilingual AI. Furthermore, examining language-agnostic properties opens avenues for further breakthroughs in cognitive modeling and understanding of language processing systems.

Conclusion

The research offers compelling evidence that multitask finetuning can lead to significant improvements in the performance of multilingual models across numerous languages and tasks. By enhancing LLMs' generalization capabilities, the paper underscores a shift towards more inclusive and accessible AI technologies. The datasets and models used in the paper are made publicly available, encouraging further research and development in this promising area.