Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in Low-Resource Languages (2403.06018v1)
Abstract: Large pre-trained LLMs (PLMs) are at the forefront of advances in Natural Language Processing. One widespread use case of PLMs is "prompting" - or in-context learning - where a user provides a description of a task and some completed examples of the task to a PLM as context before prompting the PLM to perform the task on a new example. Only the largest, most capable PLMs are able to perform in-context learning effectively, and these models are typically trained with a predominantly English corpus, leaving all other languages behind. The data limitations in most languages preclude the training of language-specific PLMs capable of prompting. Albeit the surge in work of prompting settings, it is still unclear how PLMs should be adapted cross-lingually specifically for prompting. We evaluate the possible methods to adapt LLaMa, a 7B parameter open-source PLM mainly trained in English, for prompting in low-resource languages, namely for Kinyarwanda, Hausa, and Luganda. We consider three methods: few-shot prompting (prompt), language-adaptive fine-tuning (LAFT), and neural machine translation (translate), and evaluate on abstractive summarization, multi-class topic classification, and named-entity recognition. Although LAFT carries the greatest compute cost and intuitively should lead to the best results, our experiments exhibit that LAFT is only occasionally the optimal choice for adapting PLMs for prompting. Rather, the translate and prompt settings are a compute-efficient and cost-effective method of few-shot prompting for the selected low-resource languages. We find that the results are task and language dependent but find that the prompting method is the best on average across all tasks and languages. Results show that the prompt setting performs better than both translating and LAFT with statistical significance for all shots when aggregated across all tasks and languages.
- Masakhaner: Named entity recognition for african languages.
- Massive vs. curated embeddings for low-resourced languages: the case of Yorùbá and Twi. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2754–2762, Marseille, France. European Language Resources Association.
- Adapting pre-trained language models to African languages via multilingual adaptive fine-tuning. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4336–4349, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Common voice: A massively-multilingual speech corpus.
- A neural probabilistic language model. J. Mach. Learn. Res., 3(null):1137–1155.
- Language models are few-shot learners. CoRR, abs/2005.14165.
- Palm: Scaling language modeling with pathways.
- Unsupervised cross-lingual representation learning at scale. CoRR, abs/1911.02116.
- Wietse de Vries and Malvina Nissim. 2021. As good as new. how to successfully recycle english GPT-2 to make models for other languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics.
- Qlora: Efficient finetuning of quantized llms.
- Jacob Devlin. 2019. Bert/multilingual.md at master · google-research/bert.
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929.
- A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87–110.
- XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online. Association for Computational Linguistics.
- Analyzing the forgetting problem in the pretrain-finetuning of dialogue response models.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
- Lora: Low-rank adaptation of large language models.
- IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4948–4961, Online. Association for Computational Linguistics.
- IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP. In Proceedings of the 28th International Conference on Computational Linguistics, pages 757–770, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Guillaume Lample and Alexis Conneau. 2019. Cross-lingual language model pretraining.
- The power of scale for parameter-efficient prompt tuning.
- Datasets: A community library for natural language processing.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Few-shot learning with multilingual language models.
- P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, Dublin, Ireland. Association for Computational Linguistics.
- Multilingual denoising pre-training for neural machine translation.
- Roberta: A robustly optimized bert pretraining approach.
- Low-resource languages: A review of past work and future challenges.
- CamemBERT: a tasty French language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7203–7219, Online. Association for Computational Linguistics.
- Kinnews and kirnews: Benchmarking cross-lingual text classification for kinyarwanda and kirundi.
- Small data? no problem! exploring the viability of pretrained multilingual language models for low-resourced languages. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 116–126, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- OpenAI. 2023. Gpt-4 technical report.
- Pytorch: An imperative style, high-performance deep learning library.
- Lifting the curse of multilinguality by pre-training modular transformers.
- AlBERTo: Modeling italian social media language with BERT. Italian Journal of Computational Linguistics, 5(2):11–31.
- Improving language understanding by generative pre-training.
- Language models are unsupervised multitask learners.
- Exploring the limits of transfer learning with a unified text-to-text transformer.
- Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th annual international conference on machine learning, pages 873–880.
- Teven Le Scao and Alexander M. Rush. 2021. How many data points is a prompt worth?
- Timo Schick and Hinrich Schütze. 2021. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2339–2352, Online. Association for Computational Linguistics.
- Language models are multilingual chain-of-thought reasoners.
- Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, Florence, Italy. Association for Computational Linguistics.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 142–147.
- Llama: Open and efficient foundation language models.
- Attention is all you need. CoRR, abs/1706.03762.
- Emergent abilities of large language models.
- Chain-of-thought prompting elicits reasoning in large language models.
- IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 843–857, Suzhou, China. Association for Computational Linguistics.
- Cross-lingual few-shot learning on unseen languages. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 777–791, Online only. Association for Computational Linguistics.
- Language models are few-shot multilingual learners.
- Huggingface’s transformers: State-of-the-art natural language processing.
- mt5: A massively multilingual pre-trained text-to-text transformer.
- Beyond counting datasets: A survey of multilingual dataset construction and necessary resources.
- Mengjie Zhao and Hinrich Schütze. 2021. Discrete and soft prompting for multilingual models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Christopher Toukmaji (2 papers)