Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet? (2403.01929v1)
Abstract: Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning. ICL has gained popularity recently with the advent of LLMs due to its simplicity and sample efficiency. Prior research has conducted only limited investigation into how these approaches work for multilingual few-shot learning, and the focus so far has been mostly on their performance. In this work, we present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups. Importantly, performance is only one aspect of the comparison, where we also analyse the approaches through the optics of their computational, inference and financial costs. Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements. As another contribution, we analyse the impact of target language adaptation of pretrained LLMs and find that the standard adaptation approaches can (superficially) improve target language generation capabilities, but language understanding elicited through ICL does not improve and remains limited, with low scores especially for low-resource languages.
- Gpt-4 technical report. ArXiv preprint, abs/2303.08774.
- Llm in a flash: Efficient large language model inference with limited memory. ArXiv preprint, abs/2312.11514.
- Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE.
- MAD-G: Multilingual adapter generation for efficient cross-lingual transfer. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4762–4781, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Leak, cheat, repeat: Data contamination and evaluation malpractices in closed-source llms. ArXiv preprint, abs/2402.03927.
- The Belebele Benchmark: a parallel reading comprehension dataset in 122 language variants. ArXiv preprint, abs/2308.16884.
- Emily M Bender. 2011. On achieving and evaluating language-independence in nlp. Linguistic Issues in Language Technology, 6.
- Terra Blevins and Luke Zettlemoyer. 2022. Language contamination helps explain the cross-lingual capabilities of english pretrained models. ArXiv preprint, abs/2204.08110.
- Searching for needles in a haystack: On the role of incidental bilingualism in PaLM’s translation capability. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9432–9452, Toronto, Canada. Association for Computational Linguistics.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
- NLU++: A multi-label, slot-rich, generalisable dataset for natural language understanding in task-oriented dialogue. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1998–2013, Seattle, United States. Association for Computational Linguistics.
- How many demonstrations do you need for in-context learning? In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11149–11159, Singapore. Association for Computational Linguistics.
- Meta-learning via language model in-context tuning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 719–730, Dublin, Ireland. Association for Computational Linguistics.
- Scaling instruction-finetuned language models. ArXiv preprint, abs/2210.11416.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
- XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2475–2485, Brussels, Belgium. Association for Computational Linguistics.
- Efficient and effective text encoding for chinese llama and alpaca. ArXiv preprint, abs/2304.08177.
- Qlora: Efficient finetuning of quantized llms. ArXiv preprint, abs/2305.14314.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- A primer on pretrained multilingual language models. ArXiv preprint, abs/2107.00676.
- Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 878–891, Dublin, Ireland. Association for Computational Linguistics.
- Improved and efficient conversational slot labeling through question answering. ArXiv preprint, abs/2204.02123.
- Understanding in-context learning via supportive pretraining data. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12660–12673, Toronto, Canada. Association for Computational Linguistics.
- Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Multi 3 woz: A multilingual, multi-domain, multi-parallel dataset for training and evaluating culturally adapted task-oriented dialog systems. Transactions of the Association for Computational Linguistics, 11:1396–1415.
- A systematic study of performance disparities in multilingual task-oriented dialogue systems. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6825–6851, Singapore. Association for Computational Linguistics.
- Chip Huyen. 2022. Designing machine learning systems. " O’Reilly Media, Inc.".
- Glot500: Scaling multilingual corpora and language models to 500 languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1082–1117, Toronto, Canada. Association for Computational Linguistics.
- GlotLID: Language identification for low-resource languages. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6155–6218, Singapore. Association for Computational Linguistics.
- Turning english-centric llms into polyglots: How much multilinguality is needed? ArXiv preprint, abs/2312.12683.
- From zero to hero: On the limitations of zero-shot language transfer with multilingual Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4483–4499, Online. Association for Computational Linguistics.
- On task performance and model calibration with supervised and self-ensembled in-context learning.
- Bactrian-x: A multilingual replicable instruction-following model with low-rank adaptation. ArXiv preprint, abs/2305.15011.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Mala-500: Massive language adaptation of large language models. ArXiv preprint, abs/2401.13303.
- Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Lost in the middle: How language models use long contexts. ArXiv preprint, abs/2307.03172.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- MetaICL: Learning to learn in context. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2791–2809, Seattle, United States. Association for Computational Linguistics.
- Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3470–3487, Dublin, Ireland. Association for Computational Linguistics.
- Multi3NLU++: A multilingual, multi-intent, multi-domain dataset for natural language understanding in task-oriented dialogue. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3732–3755, Toronto, Canada. Association for Computational Linguistics.
- Crosslingual generalization through multitask finetuning. ArXiv preprint, abs/2211.01786.
- How good are large language models on african languages? ArXiv preprint, abs/2311.07978.
- Multi-head adapter routing for cross-task generalization. Advances in Neural Information Processing Systems, 36.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- MAUVE: measuring the gap between neural text and human text using divergence frontiers. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 4816–4828.
- Train short, test long: Attention with linear biases enables input length extrapolation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Sqatin: Supervised instruction tuning meets question answering for improved dialogue nlu. ArXiv preprint, abs/2311.09502.
- Ohad Rubin and Jonathan Berant. 2023. Long-range language modeling with self-retrieval. ArXiv preprint, abs/2306.13421.
- XTREME-UP: A user-centric scarce-data benchmark for under-represented languages. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1856–1884, Singapore. Association for Computational Linguistics.
- Nlp evaluation in trouble: On the need to measure llm data contamination for each benchmark. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10776–10787.
- Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Multilingual instruction tuning with just a pinch of multilinguality. ArXiv preprint, abs/2401.01854.
- Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
- mgpt: Few-shot learners go multilingual. ArXiv preprint, abs/2204.07580.
- Everything you need to know about multilingual llms: Towards fair, performant and reliable models for languages of the world. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts), pages 21–26.
- Multilingual LLMs are better cross-lingual in-context learners with alignment. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6292–6307, Toronto, Canada. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv preprint, abs/2307.09288.
- Finetuned language models are zero-shot learners. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Emergent abilities of large language models. Transactions on Machine Learning Research, 2022.
- Polylm: An open source polyglot large language model. ArXiv preprint, abs/2307.06018.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
- Language models are few-shot multilingual learners. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 1–15, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- A comprehensive capability analysis of gpt-3 and gpt-3.5 series models. ArXiv preprint, abs/2303.10420.
- Llama beyond english: An empirical study on language capability transfer. ArXiv preprint, abs/2401.01055.
- Evgeniia Razumovskaia (10 papers)
- Ivan Vulić (130 papers)
- Anna Korhonen (90 papers)