Sample Attackability in Natural Language Adversarial Attacks (2306.12043v1)
Abstract: Adversarial attack research in NLP has made significant progress in designing powerful attack methods and defence approaches. However, few efforts have sought to identify which source samples are the most attackable or robust, i.e. can we determine for an unseen target model, which samples are the most vulnerable to an adversarial attack. This work formally extends the definition of sample attackability/robustness for NLP attacks. Experiments on two popular NLP datasets, four state of the art models and four different NLP adversarial attack methods, demonstrate that sample uncertainty is insufficient for describing characteristics of attackable/robust samples and hence a deep learning based detector can perform much better at identifying the most attackable and robust samples for an unseen target model. Nevertheless, further analysis finds that there is little agreement in which samples are considered the most attackable/robust across different NLP attack methods, explaining a lack of portability of attackability detection methods across attack methods.
- Tariq Abdullah and Ahmed Ahmet. 2022. Deep learning in sentiment analysis: Recent architectures. ACM Comput. Surv., 55(8).
- Generating natural language adversarial examples. CoRR, abs/1804.07998.
- A survey on nlp based text summarization for summarizing product reviews. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pages 352–356.
- ELECTRA: pre-training text encoders as discriminators rather than generators. CoRR, abs/2003.10555.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
- Melanie Ducoffe and Frédéric Precioso. 2018. Adversarial active learning for deep networks: a margin based approach. CoRR, abs/1802.09841.
- Hotflip: White-box adversarial examples for NLP. CoRR, abs/1712.06751.
- Black-box generation of adversarial text sequences to evade deep learning classifiers. CoRR, abs/1801.04354.
- Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: BERT-based adversarial examples for text classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6174–6181, Online. Association for Computational Linguistics.
- Explaining and harnessing adversarial examples.
- A survey of adversarial defences and robustness in nlp.
- Spectraldefense: Detecting adversarial attacks on cnns in the fourier domain. CoRR, abs/2103.03000.
- Preserving semantics in textual adversarial attacks.
- Is BERT really robust? natural language attack on text classification and entailment. CoRR, abs/1907.11932.
- Entropy weighted adversarial training. In ICML 2021 Workshop Adversarial Machine Learning.
- Contextualized perturbation for textual adversarial attack. CoRR, abs/2009.07502.
- Textbugger: Generating adversarial text against real-world applications. CoRR, abs/1812.05271.
- BERT-ATTACK: adversarial attack against BERT using BERT. CoRR, abs/2004.09984.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Reevaluating adversarial examples in natural language. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3829–3839, Online. Association for Computational Linguistics.
- Combating adversarial misspellings with robust word recognition. CoRR, abs/1905.11268.
- A survey of robust adversarial training in pattern recognition: Fundamental, theory, and methodologies.
- Vyas Raina and Mark Gales. 2022. Residue-based natural language adversarial attack detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
- Vyas Raina and Mark Gales. 2023. Identifying adversarially attackable and robust samples.
- A survey of deep active learning. CoRR, abs/2009.00236.
- Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1085–1097, Florence, Italy. Association for Computational Linguistics.
- CARER: Contextualized affect representations for emotion recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3687–3697, Brussels, Belgium. Association for Computational Linguistics.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
- Li-Li Sun and Xi-Zhao Wang. 2010. A survey on active learning strategy. In 2010 International Conference on Machine Learning and Cybernetics, volume 1, pages 161–166.
- It’s morphin’ time! Combating linguistic discrimination with inflectional perturbations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2920–2935, Online. Association for Computational Linguistics.
- Attention is all you need. CoRR, abs/1706.03762.
- Natural language adversarial attacks and defenses in word level. CoRR, abs/1909.06723.
- Towards adversarially robust text classifiers by learning to reweight clean examples. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1694–1707, Dublin, Ireland. Association for Computational Linguistics.
- A survey of deep learning techniques for neural machine translation. CoRR, abs/2002.07526.
- Xlnet: Generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237.
- Are adversarial examples created equal? A learnable weighted minimax risk for robustness under non-uniform attacks. CoRR, abs/2010.12989.
- Vyas Raina (18 papers)
- Mark Gales (52 papers)