Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding (2306.11066v2)

Published 19 Jun 2023 in cs.CL and cs.LG

Abstract: State-of-the-art few-shot learning (FSL) methods leverage prompt-based fine-tuning to obtain remarkable results for natural language understanding (NLU) tasks. While much of the prior FSL methods focus on improving downstream task performance, there is a limited understanding of the adversarial robustness of such methods. In this work, we conduct an extensive study of several state-of-the-art FSL methods to assess their robustness to adversarial perturbations. To better understand the impact of various factors towards robustness (or the lack of it), we evaluate prompt-based FSL methods against fully fine-tuned models for aspects such as the use of unlabeled data, multiple prompts, number of few-shot examples, model size and type. Our results on six GLUE tasks indicate that compared to fully fine-tuned models, vanilla FSL methods lead to a notable relative drop in task performance (i.e., are less robust) in the face of adversarial perturbations. However, using (i) unlabeled data for prompt-based FSL and (ii) multiple prompts flip the trend. We further demonstrate that increasing the number of few-shot examples and model size lead to increased adversarial robustness of vanilla FSL methods. Broadly, our work sheds light on the adversarial robustness evaluation of prompt-based FSL methods for NLU tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Are labels required for improving adversarial robustness? Advances in Neural Information Processing Systems, 32.
  2. Exploring the landscape of distributional robustness for question answering models. arXiv preprint arXiv:2210.12517.
  3. The fifth pascal recognizing textual entailment challenge. In TAC.
  4. Zero and few-shot learning for author profiling. In International Conference on Applications of Natural Language to Information Systems, pages 333–344. Springer.
  5. The pascal recognising textual entailment challenge. In Machine learning challenges workshop, pages 177–190. Springer.
  6. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  7. How should pre-trained language models be fine-tuned towards adversarial robustness? Advances in Neural Information Processing Systems, 34:4356–4369.
  8. Investigating meta-learning algorithms for low-resource natural language understanding tasks. arXiv preprint arXiv:1908.10423.
  9. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
  10. The third pascal recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing, pages 1–9.
  11. Adversarially robust few-shot learning: A meta-learning approach. Advances in Neural Information Processing Systems, 33:17886–17895.
  12. The second pascal recognising textual entailment challenge. In Proceedings of the Second PASCAL Challenges Workshop on Recognising Textual Entailment, volume 7.
  13. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284.
  14. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437.
  15. Are prompt-based models clueless? In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2333–2352.
  16. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.
  17. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  18. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  19. Are sample-efficient nlp models more robust? arXiv preprint arXiv:2210.06456.
  20. Gpt understands, too. arXiv preprint arXiv:2103.10385.
  21. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  22. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
  23. Subhabrata Mukherjee and Ahmed Awadallah. 2020. Uncertainty-aware self-training for few-shot text classification. Advances in Neural Information Processing Systems, 33:21199–21212.
  24. Clues: few-shot learning evaluation in natural language understanding. arXiv preprint arXiv:2111.02570.
  25. Zero-shot cross-lingual transfer with meta learning. arXiv preprint arXiv:2003.02739.
  26. True few-shot learning with language models. Advances in Neural Information Processing Systems, 34:11054–11070.
  27. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088.
  28. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
  29. Impact of pretraining term frequencies on few-shot reasoning. arXiv preprint arXiv:2202.07206.
  30. Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
  31. Timo Schick and Hinrich Schütze. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269.
  32. Timo Schick and Hinrich Schütze. 2022. True few-shot learning with prompts—a real-world perspective. Transactions of the Association for Computational Linguistics, 10:716–731.
  33. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
  34. Improving and simplifying pattern exploiting training. arXiv preprint arXiv:2103.11955.
  35. Glue: A multi-task benchmark and analysis platform for natural language understanding. In International Conference on Learning Representations.
  36. Adversarial glue: A multi-task benchmark for robustness evaluation of language models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  37. Entailment as few-shot learner. arXiv preprint arXiv:2104.14690.
  38. List: Lite self-training makes efficient few-shot learners. arXiv preprint arXiv:2110.06274.
  39. Yau-Shian Wang and Yingshan Chang. 2022. Toxicity detection with generative prompt-based inference. arXiv preprint arXiv:2205.12390.
  40. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426.
  41. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7959–7971.
  42. Unsupervised data augmentation for consistency training. Advances in Neural Information Processing Systems, 33:6256–6268.
  43. Universal natural language processing with limited annotations: Try few-shot textual entailment as a start. arXiv preprint arXiv:2010.02584.
  44. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
  45. Fewnlu: Benchmarking state-of-the-art methods for few-shot natural language understanding. arXiv preprint arXiv:2109.12742.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Venkata Prabhakara Sarath Nookala (2 papers)
  2. Gaurav Verma (34 papers)
  3. Subhabrata Mukherjee (59 papers)
  4. Srijan Kumar (61 papers)
Citations (4)