Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning New Tasks from a Few Examples with Soft-Label Prototypes (2210.17437v4)

Published 31 Oct 2022 in cs.LG and cs.CL

Abstract: Existing approaches to few-shot learning in NLP rely on LLMs and/or fine-tuning of these to generalise on out-of-distribution data. In this work, we propose a novel few-shot learning approach based on soft-label prototypes (SLPs) designed to collectively capture the distribution of different classes across the input domain space. We focus on learning previously unseen NLP tasks from very few examples (4, 8, 16) per class and experimentally demonstrate that our approach achieves superior performance on the majority of tested tasks in this data-lean setting while being highly parameter efficient. We also show that our few-shot adaptation method can be integrated into more generalised learning settings, primarily meta-learning, to yield superior performance against strong baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Deep ensembles work, but are they necessary? In Advances in Neural Information Processing Systems, volume 35, pages 33646–33660. Curran Associates, Inc.
  2. How to train your MAML. In International Conference on Learning Representations.
  3. Pitfalls of in-domain uncertainty estimation and ensembling in deep learning. arXiv preprint arXiv:2002.06470.
  4. Learning to few-shot learn across diverse natural language classification tasks.
  5. Self-supervised meta-learning for few-shot natural language classification tasks.
  6. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440–447, Prague, Czech Republic. Association for Computational Linguistics.
  7. Language models are few-shot learners.
  8. Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification.
  9. Semi-supervised sequence modeling with cross-view training.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding.
  11. Steven Diamond and Stephen Boyd. 2016. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5.
  12. Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems, pages 1–15, Berlin, Heidelberg. Springer Berlin Heidelberg.
  13. Investigating meta-learning algorithms for low-resource natural language understanding tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1192–1197, Hong Kong, China. Association for Computational Linguistics.
  14. Model-agnostic meta-learning for fast adaptation of deep networks.
  15. Making pre-trained language models better few-shot learners.
  16. Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy. PMLR.
  17. Multilingual and cross-lingual document classification: A meta-learning approach. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1966–1976, Online. Association for Computational Linguistics.
  18. Learning to learn to disambiguate: Meta-learning for few-shot word sense disambiguation.
  19. Scitail: A textual entailment dataset from science question answering. In AAAI.
  20. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338.
  21. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
  22. Meta-learning for fast cross-lingual adaptation in dependency parsing.
  23. Why m heads are better than one: Training a diverse ensemble of deep networks.
  24. The power of scale for parameter-efficient prompt tuning.
  25. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.
  26. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation.
  27. Asgard: A portable architecture for multilingual dialogue systems. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8386–8390.
  28. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55:1 – 35.
  29. Multi-task deep neural networks for natural language understanding.
  30. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization.
  31. Adversarial training methods for semi-supervised text classification.
  32. MOSEK ApS. 2019. The MOSEK optimization toolbox for MATLAB manual. Version 9.3.
  33. Zero-shot cross-lingual transfer with meta learning.
  34. Abiola Obamuyide and Andreas Vlachos. 2019a. Meta-learning improves lifelong relation extraction. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 224–229, Florence, Italy. Association for Computational Linguistics.
  35. Abiola Obamuyide and Andreas Vlachos. 2019b. Model-agnostic meta-learning for relation classification with limited supervision. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5873–5879, Florence, Italy. Association for Computational Linguistics.
  36. Language models are unsupervised multitask learners.
  37. Erik F. Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition.
  38. Multitask prompted training enables zero-shot task generalization.
  39. Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners.
  40. Prototypical networks for few-shot learning.
  41. One line to rule them all: Generating LO-shot soft-label prototypes. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE.
  42. Ilia Sucholutsky and Matthias Schonlau. 2021. ‘less than one’-shot learning: Learning n classes from m < n samples. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11):9739–9746.
  43. How to fine-tune bert for text classification?
  44. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
  45. On negative interference in multilingual models: Findings and a meta-learning treatment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4438–4450, Online. Association for Computational Linguistics.
  46. Andrew G Wilson and Pavel Izmailov. 2020. Bayesian deep learning and a probabilistic perspective of generalization. In Advances in Neural Information Processing Systems, volume 33, pages 4697–4708. Curran Associates, Inc.
  47. Huggingface’s transformers: State-of-the-art natural language processing.
  48. Unsupervised data augmentation for consistency training.
  49. Yichu Zhou and Vivek Srikumar. 2022. A closer look at how fine-tuning changes BERT. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1046–1061, Dublin, Ireland. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Avyav Kumar Singh (1 paper)
  2. Ekaterina Shutova (52 papers)
  3. Helen Yannakoudakis (32 papers)

Summary

We haven't generated a summary for this paper yet.