Multilingual Few-Shot Learning via Language Model Retrieval (2306.10964v1)
Abstract: Transformer-based LLMs have achieved remarkable success in few-shot in-context learning and drawn a lot of research interest. However, these models' performance greatly depends on the choice of the example prompts and also has high variability depending on how samples are chosen. In this paper, we conduct a comprehensive study of retrieving semantically similar few-shot samples and using them as the context, as it helps the model decide the correct label without any gradient update in the multilingual and cross-lingual settings. We evaluate the proposed method on five natural language understanding datasets related to intent detection, question classification, sentiment analysis, and topic classification. The proposed method consistently outperforms random sampling in monolingual and cross-lingual tasks in non-English languages.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Peter A Chew and Ahmed Abdelali. 2008. The effects of language relatedness on multilingual information retrieval: a case study with indo-european and semitic languages. In Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451.
- Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190, pages 12–16.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
- Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
- Zero-shot cross-lingual transfer of prompt-based tuning with a unified multilingual prompt. arXiv preprint arXiv:2202.11451.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- MTOP: A comprehensive multilingual task-oriented semantic parsing benchmark. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2950–2962, Online. Association for Computational Linguistics.
- Xin Li and Dan Roth. 2002. Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics.
- Few-shot learning with multilingual language models. arXiv preprint arXiv:2112.10668.
- Leveraging slot descriptions for zero-shot cross-domain dialogue statetracking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5640–5648.
- Evaluating multilingual text encoders for unsupervised cross-lingual retrieval. In European Conference on Information Retrieval, pages 342–358. Springer.
- What makes good in-context examples for gpt-3333? arXiv preprint arXiv:2101.06804.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.
- Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742.
- Samuel Louvan and Bernardo Magnini. 2020. Recent neural methods on slot filling and intent classification for task-oriented dialogue systems: A survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 480–496.
- Few-shot bot: Prompt-based learning for dialogue systems. arXiv preprint arXiv:2110.08118.
- Language models as few-shot learner for task-oriented dialogue systems. arXiv preprint arXiv:2008.06239.
- True few-shot learning with language models. arXiv preprint arXiv:2105.11447.
- Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473.
- Language models are unsupervised multitask learners.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21:1–67.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Timo Schick and Hinrich Schütze. 2021a. Few-shot text generation with natural language instructions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 390–402.
- Timo Schick and Hinrich Schütze. 2021b. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2339–2352.
- Cross-lingual transfer learning for multilingual task oriented dialog. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3795–3805, Minneapolis, Minnesota. Association for Computational Linguistics.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
- William Timkey and Marten van Schijndel. 2021. All bark and no bite: Rogue dimensions in transformer language models obscure representational quality. arXiv preprint arXiv:2109.04404.
- Avoiding inference heuristics in few-shot prompt-based finetuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9063–9074.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355.
- Ben Wang. 2021. Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX. https://github.com/kingoflolz/mesh-transformer-jax.
- Cross-lingual few-shot learning on unseen languages. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pages 777–791.
- Language models are few-shot multilingual learners. In Proceedings of the 1st Workshop on Multilingual Representation Learning, pages 1–15.
- mt5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Calibrate before use: Improving few-shot performance of language models. arXiv preprint arXiv:2102.09690.
- Genta Indra Winata (94 papers)
- Liang-Kang Huang (3 papers)
- Soumya Vadlamannati (1 paper)
- Yash Chandarana (3 papers)