Few-shot Learning with Multilingual Language Models (2112.10668v3)

Published 20 Dec 2021 in cs.CL and cs.AI

Abstract: Large-scale generative LLMs such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative LLMs on a corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We conduct an in-depth analysis of different multilingual prompting approaches, showing in particular that strong few-shot learning performance across languages can be achieved via cross-lingual transfer through both templates and demonstration examples. Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models.

PDF Abstract

Multilingual Generative LLMs for Few-Shot Learning

The paper presented in the paper investigates the effectiveness of multilingual generative LLMs in few-shot and zero-shot learning tasks, extending beyond the English-centric scope of models such as GPT-3. Introduced as XGLM, these models aim to balance the inherent English dominance in training data and explore the cross-lingual potential of generative models within a multilingual framework. The research trains models on a corpus that covers 30 different languages and examines their comparative performance across various tasks.

Key Contributions and Results

The authors present a comprehensive framework in which four multilingual generative models are trained, with the largest containing 7.5 billion parameters. This framework facilitates the exploration of few-shot and zero-shot learning capabilities across several language tasks, including natural language understanding, commonsense reasoning, and machine translation. Key highlights include:

Cross-Lingual Few-Shot Learning: The XGLM models demonstrate advanced cross-lingual capabilities by leveraging English prompts and examples for strong performance on non-English tasks. The models achieve notable success in translating knowledge across languages without language-specific training, underscoring the models' cross-lingual transferability.
State-of-the-Art Multilingual Performance: The paper reports substantial gains over GPT-3 in multilingual tasks. On machine translation tasks, for instance, XGLM outperforms its GPT-3 counterpart in 171 out of 182 FLORES-101 dataset language pairs. Its competitive performance is also evident in natural language inference and commonsense reasoning tasks, reinforcing the importance of a more balanced multilingual dataset in model pre-training.
Impact of Model Scaling: An analysis of model scaling reveals that larger models better utilize demonstration examples, enhancing few-shot learning performance in target languages. However, results also indicate diminishing returns in performance for some high-resource languages as model size increases, aligning with observations associated with the "curse of multilinguality."

Practical and Theoretical Implications

The research highlights significant implications for the development and deployment of multilingual LLMs. Practically, the results suggest scaling up model capacity and adopting a diversified corpus could improve the performance of multilingual models, making them powerful alternatives to fine-tuned, language-specific systems. Theoretically, the findings contribute to understanding cross-lingual transfer mechanisms and the role of pre-training data distribution in shaping model performance across various languages.

Future Directions

The paper opens several avenues for future research. There is room for investigating more sophisticated prompts that could further enhance cross-lingual performance and understanding the impact of different pre-training distributions. Additionally, addressing the limitations related to the "curse of multilinguality" and systematically evaluating how pre-training size and data quality affect performance on low-resource languages could provide further insights.

Conclusion

This paper provides valuable insights into the capabilities of multilingual generative LLMs. By presenting a robust comparison with existing models and achieving superior performance in a range of tasks, the paper makes a notable contribution to the field of NLP, particularly in the context of developing inclusive, language-agnostic AI systems. The findings underscore the potential of multilingual models in achieving few-shot learning across diverse linguistic landscapes, paving the way for further advancements in AI's multilingual capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (21)

Xi Victoria Lin (39 papers)
Todor Mihaylov (23 papers)
Mikel Artetxe (52 papers)
Tianlu Wang (33 papers)
Shuohui Chen (4 papers)
Daniel Simig (10 papers)
Myle Ott (33 papers)
Naman Goyal (37 papers)
Shruti Bhosale (18 papers)
Jingfei Du (16 papers)
Ramakanth Pasunuru (32 papers)
Sam Shleifer (15 papers)
Punit Singh Koura (10 papers)
Vishrav Chaudhary (45 papers)
Brian O'Horo (3 papers)
Jeff Wang (11 papers)
Luke Zettlemoyer (225 papers)
Zornitsa Kozareva (16 papers)
Mona Diab (71 papers)
Veselin Stoyanov (21 papers)

Citations (261)

View on Semantic Scholar