Zero-shot cross-lingual transfer in instruction tuning of large language models (2402.14778v2)
Abstract: Instruction tuning (IT) is widely used to teach pretrained LLMs to follow arbitrary instructions, but is under-studied in multilingual settings. In this work, we conduct a systematic study of zero-shot cross-lingual transfer in IT, when an LLM is instruction-tuned on English-only data and then tested on user prompts in other languages. We advocate for the importance of evaluating various aspects of model responses in multilingual instruction following and investigate the influence of different model configuration choices. We find that cross-lingual transfer does happen successfully in IT even if all stages of model training are English-centric, but only if multiliguality is taken into account in hyperparameter tuning and with large enough IT data. English-trained LLMs are capable of generating correct-language, comprehensive and helpful responses in other languages, but suffer from low factuality and may occasionally have fluency errors.
- 2023. Cabrita: A portuguese finetuned instruction llama. https://github.com/22-hours/cabrita.
- 2023. Zicklein: A german finetuned instruction llama. https://github.com/avocardio/Zicklein.
- Anonymous. 2023. Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks: learning rate is (almost) all you need. OpenReview, ACL Rolling Review December 2023.
- On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4623–4637, Online. Association for Computational Linguistics.
- Terra Blevins and Luke Zettlemoyer. 2022. Language contamination helps explain the cross-lingual capabilities of english pretrained models.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
- Monolingual or multilingual instruction tuning: Which makes a better alpaca.
- Palm: Scaling language modeling with pathways.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
- Databricks. 2023. Free dolly: Introducing the world’s first truly open instruction-tuned llm. Blog post.
- Alpacafarm: A simulation framework for methods that learn from human feedback. In Thirty-seventh Conference on Neural Information Processing Systems.
- Do multilingual language models think better in english? ArXiv, abs/2308.01223.
- The pile: An 800gb dataset of diverse text for language modeling.
- Turning english-centric llms into polyglots: How much multilinguality is needed?
- Openassistant conversations – democratizing large language model alignment. In NeurIPS 2023 Datasets and Benchmarks Track.
- Tianjian Li and Kenton Murray. 2023. Why does zero-shot cross-lingual generation fail? an explanation and a solution. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12461–12476, Toronto, Canada. Association for Computational Linguistics.
- ZmBART: An unsupervised cross-lingual transfer framework for language generation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2804–2818, Online. Association for Computational Linguistics.
- Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15991–16111, Toronto, Canada. Association for Computational Linguistics.
- No language left behind: Scaling human-centered machine translation.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
- mmt5: Modular multilingual pre-training solves source language hallucinations.
- MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 7654–7673, Online. Association for Computational Linguistics.
- How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996–5001, Florence, Italy. Association for Computational Linguistics.
- Empowering cross-lingual abilities of instruction-tuned large language models by translation-following demonstrations.
- Bloom: A 176b-parameter open-access multilingual language model.
- Multilingual instruction tuning with just a pinch of multilinguality.
- Aya dataset: An open-access collection for multilingual instruction tuning.
- Overcoming catastrophic forgetting in zero-shot cross-lingual generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9279–9300, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Polylm: An open source polyglot large language model.
- Shijie Wu and Mark Dredze. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 833–844, Hong Kong, China. Association for Computational Linguistics.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- Language versatilists vs. specialists: An empirical revisiting on multilingual transfer ability.
- Llmeval: A preliminary study on how to evaluate large language models.
- Plug: Leveraging pivot language in cross-lingual instruction tuning.
- LIMA: Less is more for alignment. In Thirty-seventh Conference on Neural Information Processing Systems.
- Nadezhda Chirkova (25 papers)
- Vassilina Nikoulina (28 papers)