Less is More: Understanding Word-level Textual Adversarial Attack via n-gram Frequency Descend (2302.02568v4)
Abstract: Word-level textual adversarial attacks have demonstrated notable efficacy in misleading NLP models. Despite their success, the underlying reasons for their effectiveness and the fundamental characteristics of adversarial examples (AEs) remain obscure. This work aims to interpret word-level attacks by examining their $n$-gram frequency patterns. Our comprehensive experiments reveal that in approximately 90\% of cases, word-level attacks lead to the generation of examples where the frequency of $n$-grams decreases, a tendency we term as the $n$-gram Frequency Descend ($n$-FD). This finding suggests a straightforward strategy to enhance model robustness: training models using examples with $n$-FD. To examine the feasibility of this strategy, we employed the $n$-gram frequency information, as an alternative to conventional loss gradients, to generate perturbed examples in adversarial training. The experiment results indicate that the frequency-based approach performs comparably with the gradient-based approach in improving model robustness. Our research offers a novel and more intuitive perspective for understanding word-level textual adversarial attacks and proposes a new direction to improve model robustness.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015.
- N. Papernot, P. D. McDaniel, A. Swami, and R. E. Harang, “Crafting adversarial input sequences for recurrent neural networks,” in 2016 IEEE Military Communications Conference, MILCOM 2016, Baltimore, MD, USA, November 1-3, 2016, J. Brand, M. C. Valenti, A. Akinpelu, B. T. Doshi, and B. L. Gorsic, Eds. IEEE, 2016, pp. 49–54.
- J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi, “Black-box generation of adversarial text sequences to evade deep learning classifiers,” in 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018. IEEE Computer Society, 2018, pp. 50–56.
- J. Li, S. Ji, T. Du, B. Li, and T. Wang, “Textbugger: Generating adversarial text against real-world applications,” in 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society, 2019.
- M. Alzantot, Y. Sharma, A. Elgohary, B. Ho, M. B. Srivastava, and K. Chang, “Generating natural language adversarial examples,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, E. Riloff, D. Chiang, J. Hockenmaier, and J. Tsujii, Eds. Association for Computational Linguistics, 2018, pp. 2890–2896.
- S. Ren, Y. Deng, K. He, and W. Che, “Generating natural language adversarial examples through probability weighted word saliency,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, A. Korhonen, D. R. Traum, and L. Màrquez, Eds. Association for Computational Linguistics, 2019, pp. 1085–1097.
- M. T. Ribeiro, S. Singh, and C. Guestrin, “Semantically equivalent adversarial rules for debugging NLP models,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, I. Gurevych and Y. Miyao, Eds. Association for Computational Linguistics, 2018, pp. 856–865.
- R. Jia and P. Liang, “Adversarial examples for evaluating reading comprehension systems,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, M. Palmer, R. Hwa, and S. Riedel, Eds. Association for Computational Linguistics, 2017, pp. 2021–2031.
- Y. Lei, Y. Cao, D. Li, T. Zhou, M. Fang, and M. Pechenizkiy, “Phrase-level textual adversarial attack with label preservation,” in Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, M. Carpuat, M. de Marneffe, and I. V. M. Ruíz, Eds. Association for Computational Linguistics, 2022, pp. 1095–1112.
- D. Pruthi, B. Dhingra, and Z. C. Lipton, “Combating adversarial misspellings with robust word recognition,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, A. Korhonen, D. R. Traum, and L. Màrquez, Eds. Association for Computational Linguistics, 2019, pp. 5582–5591.
- J. Y. Yoo, J. X. Morris, E. Lifland, and Y. Qi, “Searching for a search method: Benchmarking search algorithms for generating NLP adversarial examples,” in Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP 2020, Online, November 2020, A. Alishahi, Y. Belinkov, G. Chrupala, D. Hupkes, Y. Pinter, and H. Sajjad, Eds. Association for Computational Linguistics, 2020, pp. 323–332.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- C. Zhu, Y. Cheng, Z. Gan, S. Sun, T. Goldstein, and J. Liu, “Freelb: Enhanced adversarial training for natural language understanding,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- X. Dong, A. T. Luu, R. Ji, and H. Liu, “Towards robustness against natural language word substitutions,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Y. Zang, F. Qi, C. Yang, Z. Liu, M. Zhang, Q. Liu, and M. Sun, “Word-level textual adversarial attacking as combinatorial optimization,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 2020, pp. 6066–6080.
- Y. Zhou, J. Jiang, K. Chang, and W. Wang, “Learning to discriminate perturbations for blocking adversarial attacks in text classification,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, 2019, pp. 4903–4912.
- L. Wang and X. Zheng, “Improving grammatical error correction models with purpose-built adversarial examples,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, B. Webber, T. Cohn, Y. He, and Y. Liu, Eds. Association for Computational Linguistics, 2020, pp. 2858–2869.
- R. Jia, A. Raghunathan, K. Göksel, and P. Liang, “Certified robustness to adversarial word substitutions,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, 2019, pp. 4127–4140.
- M. Ye, C. Gong, and Q. Liu, “SAFER: A structure-free approach for certified robustness to adversarial word substitutions,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, Eds. Association for Computational Linguistics, 2020, pp. 3465–3475.
- A. Bhowmick and S. M. Hazarika, “E-mail spam filtering: a review of techniques and trends,” Advances in electronics, communication and computing, pp. 583–590, 2018.
- H. Hosseini, S. Kannan, B. Zhang, and R. Poovendran, “Deceiving google’s perspective API built for detecting toxic comments,” CoRR, vol. abs/1702.08138, 2017.
- W. B. Cavnar, J. M. Trenkle et al., “N-gram-based text categorization,” in Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, vol. 161175. Citeseer, 1994.
- J. Angwin, J. Larson, S. Mattu, and L. Kirchner, “Machine bias,” in Ethics of Data and Analytics. Auerbach Publications, 2016, pp. 254–264.
- C. Gong, D. He, X. Tan, T. Qin, L. Wang, and T. Liu, “FRAGE: frequency-agnostic word representation,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 1341–1352.
- M. Mozes, P. Stenetorp, B. Kleinberg, and L. D. Griffin, “Frequency-guided word substitutions for detecting textual adversarial examples,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, P. Merlo, J. Tiedemann, and R. Tsarfaty, Eds. Association for Computational Linguistics, 2021, pp. 171–186.
- Y. Zhou, X. Zheng, C. Hsieh, K. Chang, and X. Huang, “Defense against synonym substitution-based adversarial attacks via dirichlet neighborhood ensemble,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, C. Zong, F. Xia, W. Li, and R. Navigli, Eds. Association for Computational Linguistics, 2021, pp. 5482–5492.
- W. E. Zhang, Q. Z. Sheng, A. Alhazmi, and C. Li, “Adversarial attacks on deep-learning models in natural language processing: A survey,” ACM Trans. Intell. Syst. Technol., vol. 11, no. 3, pp. 24:1–24:41, 2020.
- D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits, “Is bert really robust? a strong baseline for natural language attack on text classification and entailment,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 05, 2020, pp. 8018–8025.
- S. Liu, N. Lu, C. Chen, and K. Tang, “Efficient combinatorial optimization for word-level adversarial textual attack,” IEEE ACM Trans. Audio Speech Lang. Process., vol. 30, pp. 98–111, 2022.
- J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adversarial examples for text classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, I. Gurevych and Y. Miyao, Eds. Association for Computational Linguistics, 2018, pp. 31–36.
- F. Qi, C. Yang, Z. Liu, Q. Dong, M. Sun, and Z. Dong, “Openhownet: An open sememe-based lexical knowledge base,” CoRR, vol. abs/1901.09957, 2019.
- G. A. Miller, “Wordnet: A lexical database for english,” Commun. ACM, vol. 38, no. 11, pp. 39–41, 1995.
- N. Mrksic, D. Ó. Séaghdha, B. Thomson, M. Gasic, L. M. Rojas-Barahona, P. Su, D. Vandyke, T. Wen, and S. J. Young, “Counter-fitting word vectors to linguistic constraints,” in NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, K. Knight, A. Nenkova, and O. Rambow, Eds. The Association for Computational Linguistics, 2016, pp. 142–148.
- M. Lan, Z. Zhang, Y. Lu, and J. Wu, “Three convolutional neural network-based models for learning sentiment word vectors towards sentiment analysis,” in 2016 International Joint Conference on Neural Networks, IJCNN 2016, Vancouver, BC, Canada, July 24-29, 2016. IEEE, 2016, pp. 3172–3179.
- X. Zhang, J. J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds., 2015, pp. 649–657.
- Y. Kim, “Convolutional neural networks for sentence classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, A. Moschitti, B. Pang, and W. Daelemans, Eds. ACL, 2014, pp. 1746–1751.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
- J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds. Association for Computational Linguistics, 2019, pp. 4171–4186.
- B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi, “Deep text classification can be fooled,” in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, J. Lang, Ed. ijcai.org, 2018, pp. 4208–4215.
- Z. Zhao, D. Dua, and S. Singh, “Generating natural adversarial examples,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Z. Li, C. Wang, C. Liu, P. Ma, D. Wu, S. Wang, and C. Gao, “Vrptest: Evaluating visual referring prompting in large multimodal models,” arXiv preprint arXiv:2312.04087, 2023.
- N. Lu, S. Liu, R. He, W. Qi, and K. Tang, “Large language models can be guided to evade ai-generated text detection,” arXiv preprint arXiv:2305.10847, 2023.
- Z. Li, P. Ma, H. Wang, S. Wang, Q. Tang, S. Nie, and S. Wu, “Unleashing the power of compiler intermediate representation to enhance neural program embeddings,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 2253–2265.
- P. Yang, L. Zhang, H. Liu, and G. Li, “Reducing idleness in financial cloud services via multi-objective evolutionary reinforcement learning based load balancer,” Science China Information Sciences, vol. 67, no. 2, p. 120102, 2024.
- J. Wu, W. Fan, S. Liu, Q. Liu, R. He, Q. Li, and K. Tang, “Dataset condensation for recommendation,” in arXiv, 2023.
- J. Wu, W. Fan, J. Chen, S. Liu, Q. Li, and K. Tang, “Disentangled contrastive learning for social recommendation,” in Proc. of CIKM’2022. ACM, 2022.
- S. Liu, C. Chen, X. Qu, K. Tang, and Y.-S. Ong, “Large language models as evolutionary optimizers,” arXiv preprint arXiv:2310.19046, 2023.
- J. Guo, Z. Zhang, L. Zhang, L. Xu, B. Chen, E. Chen, and W. Luo, “Towards variable-length textual adversarial attacks,” ArXiv, vol. abs/2104.08139, 2021.
- S. Liu, N. Lu, W. Hong, C. Qian, and K. Tang, “Effective and imperceptible adversarial textual attack via multi-objectivization,” ACM Transactions on Evolutionary Learning and Optimization, 2024, just Accepted.
- J. Dong, Y. Wang, J.-H. Lai, and X. Xie, “Improving adversarially robust few-shot image classification with generalizable representations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9025–9034.
- J. Dong, S.-M. Moosavi-Dezfooli, J. Lai, and X. Xie, “The enemy of my enemy is my friend: Exploring inverse adversaries for improving adversarial training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 24 678–24 687.
- Z. Shi, H. Zhang, K. Chang, M. Huang, and C. Hsieh, “Robustness verification for transformers,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
Collections
Sign up for free to add this paper to one or more collections.