In and Out-of-Domain Text Adversarial Robustness via Label Smoothing (2212.10258v2)
Abstract: Recently it has been shown that state-of-the-art NLP models are vulnerable to adversarial attacks, where the predictions of a model can be drastically altered by slight modifications to the input (such as synonym substitutions). While several defense techniques have been proposed, and adapted, to the discrete nature of text adversarial attacks, the benefits of general-purpose regularization methods such as label smoothing for LLMs, have not been studied. In this paper, we study the adversarial robustness provided by various label smoothing strategies in foundational models for diverse NLP tasks in both in-domain and out-of-domain settings. Our experiments show that label smoothing significantly improves adversarial robustness in pre-trained models like BERT, against various popular attacks. We also analyze the relationship between prediction confidence and robustness, showing that label smoothing reduces over-confident errors on adversarial examples.
- Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998.
- A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 632–642, Lisbon, Portugal. Association for Computational Linguistics.
- Nicholas Carlini and David Wagner. 2016. Towards evaluating the robustness of neural networks.
- Jan Chorowski and Navdeep Jaitly. 2017. Towards better decoding and language model integration in sequence to sequence models. Proc. Interspeech 2017, pages 523–527.
- Soham Dan and Dan Roth. 2021. On the Effects of Transformer Size on In- and Out-of-Domain Calibration. In Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Shrey Desai and Greg Durrett. 2020. Calibration of pre-trained transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 295–302.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Label smoothing and adversarial robustness. arXiv preprint arXiv:2009.08233.
- Siddhant Garg and Goutham Ramakrishnan. 2020. Bae: Bert-based adversarial examples for text classification. arXiv preprint arXiv:2004.01970.
- Morgane Goibert and Elvis Dohmatob. 2019. Adversarial robustness via adversarial label-smoothing.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
- On calibration of modern neural networks. In International Conference on Machine Learning, pages 1321–1330. PMLR.
- Gradient-based adversarial attacks against text transformers. arXiv preprint arXiv:2104.13733.
- Context-aware selective label smoothing for calibrating sequence recognition model. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4591–4599.
- Is bert really robust? natural language attack on text classification and entailment. arXiv preprint arXiv:1907.11932.
- Calibrated language model fine-tuning for in-and out-of-distribution data. arXiv preprint arXiv:2010.11506.
- Query-efficient and scalable black-box adversarial attacks on discrete sequential data via bayesian optimization. In International Conference on Machine Learning, pages 12478–12497. PMLR.
- Contextualized perturbation for textual adversarial attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5053–5069, Online. Association for Computational Linguistics.
- Bert-attack: Adversarial attack against bert using bert. arXiv preprint arXiv:2004.09984.
- Chihuang Liu and Joseph JaJa. 2020. Class-similarity based label smoothing for generalized confidence calibration. In arXiv preprint arXiv: 2006.14028.
- Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
- Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp.
- When does label smoothing help? Advances in neural information processing systems, 32.
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427–436.
- Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL.
- Improving calibration through the relationship with adversarial robustness. Advances in Neural Information Processing Systems, 34:14358–14369.
- Language models are unsupervised multitask learners.
- Intra order-preserving functions for calibration of multi-class neural networks. Advances in Neural Information Processing Systems, 33:13456–13467.
- Visualizing and measuring the geometry of bert. Advances in Neural Information Processing Systems, 32.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Label smoothing and logit squeezing: a replacement for adversarial training? arXiv preprint arXiv:1910.11585.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
- Attention is all you need. Advances in neural information processing systems, 30.
- SemAttack: Natural textual attacks via different semantic spaces. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 176–205, Seattle, United States. Association for Computational Linguistics.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Delving deep into label smoothing. IEEE Transactions on Image Processing, 30:5984–5996.
- Character-level Convolutional Networks for Text Classification. arXiv:1509.01626 [cs].
- Character-level convolutional networks for text classification. In NIPS.
- Yahan Yang (8 papers)
- Soham Dan (41 papers)
- Dan Roth (222 papers)
- Insup Lee (68 papers)