Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed? (2312.12683v2)

Published 20 Dec 2023 in cs.CL

Abstract: The vast majority of today's LLMs are English-centric, having been pretrained predominantly on English text. Yet, in order to meet user expectations, models need to be able to respond appropriately in multiple languages once deployed in downstream applications. This requires strong cross-lingual transfer abilities. In this work, we investigate the minimal amount of multilinguality required during finetuning to elicit cross-lingual generalisation in English-centric LLMs. In experiments across four LLMs, we find that multilingual instruction tuning with as few as two to three languages is both necessary and sufficient to elicit effective cross-lingual generalisation, with the limiting factor being the degree to which a target language is seen during pretraining. Evaluations on five different tasks further reveal that multilingual instruction tuning is most beneficial for generative tasks that assume input/output language agreement, such as in chat settings, while being of less importance for highly structured classification-style tasks. Our code and data is available at https://github.com/ZurichNLP/multilingual-instruction-tuning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. The falcon series of open language models.
  2. On the multilingual capabilities of very large-scale English language models. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3056–3068, Marseille, France. European Language Resources Association.
  3. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online. Association for Computational Linguistics.
  4. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
  5. Terra Blevins and Luke Zettlemoyer. 2022. Language contamination helps explains the cross-lingual capabilities of English pretrained models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3563–3574, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  6. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  7. An open dataset and model for language identification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 865–879, Toronto, Canada. Association for Computational Linguistics.
  8. Monolingual or multilingual instruction tuning: Which makes a better alpaca.
  9. Cheng-Han Chiang and Hung-yi Lee. 2023. Can large language models be an alternative to human evaluations? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15607–15631, Toronto, Canada. Association for Computational Linguistics.
  10. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  11. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  12. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics.
  13. Free dolly: Introducing the world’s first truly open instruction-tuned llm.
  14. No language left behind: Scaling human-centered machine translation.
  15. Qlora: Efficient finetuning of quantized llms. arXiv preprint arXiv:2305.14314.
  16. Enhancing chat language models by scaling high-quality instructional conversations.
  17. Alpacafarm: A simulation framework for methods that learn from human feedback.
  18. A framework for few-shot language model evaluation.
  19. Are large language model-based evaluators the solution to scaling up multilingual evaluation?
  20. How good are gpt models at machine translation? a comprehensive evaluation.
  21. Training compute-optimal large language models.
  22. Bridging the resource gap: Exploring the efficacy of English and multilingual LLMs for Swedish. In Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2023), pages 92–110, Tórshavn, the Faroe Islands. Association for Computational Linguistics.
  23. Lora: Low-rank adaptation of large language models.
  24. Glot500: Scaling multilingual corpora and language models to 500 languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1082–1117, Toronto, Canada. Association for Computational Linguistics.
  25. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880, Online. Association for Computational Linguistics.
  26. Is chatgpt a good translator? yes with gpt-4 as the engine.
  27. Efficient memory management for large language model serving with pagedattention.
  28. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
  29. Chatgpt beyond english: Towards a comprehensive evaluation of large language models in multilingual learning.
  30. Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback.
  31. A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets. In Findings of the Association for Computational Linguistics: ACL 2023, pages 431–469, Toronto, Canada. Association for Computational Linguistics.
  32. Bloom: A 176b-parameter open-access multilingual language model.
  33. What language model to train if you have one million GPU hours? In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 765–782, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  34. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  35. Loftq: Lora-fine-tuning-aware quantization for large language models.
  36. Common sense beyond English: Evaluating and improving multilingual language models for commonsense reasoning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1274–1287, Online. Association for Computational Linguistics.
  37. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  38. The flan collection: Designing data and methods for effective instruction tuning.
  39. Chain-of-dictionary prompting elicits translation in large language models.
  40. Cross-task generalization via natural language crowdsourcing instructions.
  41. Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada. Association for Computational Linguistics.
  42. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  43. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  44. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only.
  45. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673, Online. Association for Computational Linguistics.
  46. Multitask prompted training enables zero-shot task generalization.
  47. Causes and cures for interference in multilingual translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15849–15863, Toronto, Canada. Association for Computational Linguistics.
  48. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  49. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  50. Llama: Open and efficient foundation language models.
  51. Llama 2: Open foundation and fine-tuned chat models.
  52. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
  53. Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  54. Finetuned language models are zero-shot learners.
  55. Shijie Wu and Mark Dredze. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China. Association for Computational Linguistics.
  56. Language versatilists vs. specialists: An empirical revisiting on multilingual transfer ability.
  57. Llama-adapter: Efficient fine-tuning of language models with zero-init attention.
  58. Plug: Leveraging pivot language in cross-lingual instruction tuning. arXiv preprint arXiv:2311.08711.
  59. Judging llm-as-a-judge with mt-bench and chatbot arena. CoRR, abs/2306.05685.
  60. Lima: Less is more for alignment.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Tannon Kew (3 papers)
  2. Florian Schottmann (5 papers)
  3. Rico Sennrich (87 papers)
Citations (28)