Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multilingual Instruction Tuning With Just a Pinch of Multilinguality (2401.01854v4)

Published 3 Jan 2024 in cs.CL, cs.AI, and cs.LG
Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Abstract: As instruction-tuned LLMs gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.

Introduction to Multilingual Instruction Tuning

With the rise of LLMs, enhancing their multilingual capabilities has become a focal point for global usability. When these models are fine-tuned with instructions and corresponding responses—a process known as instruction tuning—they ideally learn to follow instructions more effectively. Despite the potential of such models, most instruction tuning has been predominantly conducted using English language examples. As LLMs continue to serve a worldwide user base, the need for multilingual instruction tuning—that is, fine-tuning models with data from multiple languages—becomes paramount.

Cross-Lingual Transfer and Its Implications

Prior research has suggested that models fine-tuned in one language can attain certain capabilities in other languages—a phenomenon known as cross-lingual transfer. The paper under discussion brings to light new insights regarding this process, specifically within the field of instruction tuning for multilingual LLMs. A key discovery is that languages, when utilized individually for tuning, can imbue the model with instruction-following abilities that transcend the language of fine-tuning. English, Italian, and Spanish were found to exhibit robust multilingual transfer capabilities.

The Surprising Effectiveness of Limited Multilinguality

The research presents a striking finding: the inclusion of multilingual examples in what is primarily an English instruction tuning set can substantially improve a model's multilingual instruction-following skills. Moreover, this broadening of capabilities does not solely benefit languages included in the instruction tuning set; it also enhances performance in languages that the model was only exposed to during its pretraining phase. Intriguingly, the paper found that this improvement occurs with the inclusion of as few as 40 multilingual examples—a surprising testament to the power of even a modest amount of language diversity during instruction tuning.

Toward a Massively Multilingual Future

The implications of these findings are profound for the development of globally-oriented LLMs. For one, achieving multilinguality in instruction-tuned models does not necessitate exhaustive multilingual training data. Quite the contrary, models could be fine-tuned with a minimal set of instructions in various languages and still manage to follow directions across a plethora of languages they were not explicitly trained on. The paper further explores the impact of the number of languages in the tuning set and the fraction of language-specific data used during pretraining, but it does not find strong correlations with cross-lingual transfer effectiveness.

In summary, this paper posits that cross-lingual transfer via instruction tuning has the potential to pave the way for efficiently developing capable multilingual LLMs. These findings could have a significant impact on the way models are fine-tuned, leveraging minimal but diverse multilingual data to cater to a vast array of languages—thus making advanced AI technologies more accessible to users around the globe.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Palm 2 technical report.
  2. Mikel Artetxe and Holger Schwenk. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7:597–610.
  3. Training a helpful and harmless assistant with reinforcement learning from human feedback.
  4. Sparks of artificial general intelligence: Early experiments with gpt-4.
  5. Monolingual or multilingual instruction tuning: Which makes a better alpaca.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  7. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  8. Emerging cross-lingual structure in pretrained language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6022–6034, Online. Association for Computational Linguistics.
  9. QLoRA: Efficient finetuning of quantized LLMs. In Thirty-seventh Conference on Neural Information Processing Systems.
  10. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  11. Alpacafarm: A simulation framework for methods that learn from human feedback.
  12. Scaling laws for multilingual neural machine translation.
  13. Koala: A dialogue model for academic research. Blog post.
  14. The false promise of imitating proprietary llms.
  15. The curious case of neural text degeneration. In International Conference on Learning Representations.
  16. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  17. Cross-lingual ability of multilingual bert: An empirical study. In International Conference on Learning Representations.
  18. Openassistant conversations – democratizing large language model alignment.
  19. Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 318–327, Singapore. Association for Computational Linguistics.
  20. Self-alignment with instruction backtranslation.
  21. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  22. A balanced data approach for evaluating cross-lingual transfer: Mapping the linguistic blood bank. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4903–4915, Seattle, United States. Association for Computational Linguistics.
  23. Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
  24. Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada. Association for Computational Linguistics.
  25. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  26. English intermediate-task training improves zero-shot cross-lingual transfer too. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 557–575, Suzhou, China. Association for Computational Linguistics.
  27. How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996–5001, Florence, Italy. Association for Computational Linguistics.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  29. Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
  30. Causes and cures for interference in multilingual translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15849–15863, Toronto, Canada. Association for Computational Linguistics.
  31. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  32. Llama: Open and efficient foundation language models.
  33. Llama 2: Open foundation and fine-tuned chat models.
  34. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  35. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
  36. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  37. Shijie Wu and Mark Dredze. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China. Association for Computational Linguistics.
  38. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
  39. Language versatilists vs. specialists: An empirical revisiting on multilingual transfer ability.
  40. Plug: Leveraging pivot language in cross-lingual instruction tuning.
  41. Judging llm-as-a-judge with mt-bench and chatbot arena.
  42. Lima: Less is more for alignment.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Uri Shaham (35 papers)
  2. Jonathan Herzig (34 papers)
  3. Roee Aharoni (35 papers)
  4. Idan Szpektor (47 papers)
  5. Reut Tsarfaty (54 papers)
  6. Matan Eyal (15 papers)
Citations (27)