Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MPAT: Building Robust Deep Neural Networks against Textual Adversarial Attacks (2402.18792v1)

Published 29 Feb 2024 in cs.LG, cs.CL, and cs.CR

Abstract: Deep neural networks have been proven to be vulnerable to adversarial examples and various methods have been proposed to defend against adversarial attacks for natural language processing tasks. However, previous defense methods have limitations in maintaining effective defense while ensuring the performance of the original task. In this paper, we propose a malicious perturbation based adversarial training method (MPAT) for building robust deep neural networks against textual adversarial attacks. Specifically, we construct a multi-level malicious example generation strategy to generate adversarial examples with malicious perturbations, which are used instead of original inputs for model training. Additionally, we employ a novel training objective function to ensure achieving the defense goal without compromising the performance on the original task. We conduct comprehensive experiments to evaluate our defense method by attacking five victim models on three benchmark datasets. The result demonstrates that our method is more effective against malicious adversarial attacks compared with previous defense methods while maintaining or further improving the performance on the original task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (68)
  1. Z. Li, J. Xu, J. Zeng, L. Li, X. Zheng, Q. Zhang, K. Chang, and C. Hsieh, “Searching for an effective defender: Benchmarking defense against adversarial word substitution,” in EMNLP, 2021.
  2. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in ICLR, 2014.
  3. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015.
  4. N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings.”   IEEE, 2016.
  5. S. Qiu, Q. Liu, S. Zhou, and W. Huang, “Adversarial attack and defense technologies in natural language processing: A survey,” Neurocomputing, vol. 492, pp. 278–307, 2022.
  6. S. Ren, Y. Deng, K. He, and W. Che, “Generating natural language adversarial examples through probability weighted word saliency,” in ACL, 2019.
  7. D. Jin, Z. Jin, J. T. Zhou, and P. Szolovits, “Is bert really robust? a strong baseline for natural language attack on text classification and entailment,” in National Conference on Artificial Intelligence, 2020.
  8. N. Mrki, D. Séaghdha, B. Thomson, M. Gai, L. Rojas-Barahona, P. H. Su, D. Vandyke, T. H. Wen, and S. Young, “Counter-fitting word vectors to linguistic constraints,” 2016.
  9. L. Li, R. Ma, Q. Guo, X. Xue, and X. Qiu, “Bert-attack: Adversarial attack against bert using bert,” 2020.
  10. A. Shafahi, M. Najibi, A. Ghiasi, Z. Xu, J. P. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein, “Adversarial training for free!” 2019.
  11. Y. Zhang, H. Zhou, and Z. Li, “Fast and accurate neural CRF constituency parsing,” in IJCAI, 2020.
  12. A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts, “Learning word vectors for sentiment analysis.”   The Association for Computer Linguistics, 2011.
  13. X. Zhang, J. J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” 2015.
  14. M. Ye, C. Gong, and Q. Liu, “SAFER: A structure-free approach for certified robustness to adversarial word substitutions,” in ACL, 2020.
  15. K. Krishna, J. Wieting, and M. Iyyer, “Reformulating unsupervised style transfer as paraphrase generation,” in EMNLP.   Association for Computational Linguistics, 2020.
  16. N. Carlini and D. A. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE.   IEEE Computer Society, 2017.
  17. S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Universal adversarial perturbations,” CoRR, vol. abs/1610.08401, 2016.
  18. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018.
  19. H. Liu, R. Ji, J. Li, B. Zhang, Y. Gao, Y. Wu, and F. Huang, “Universal adversarial perturbation via prior driven uncertainty approximation,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV.   IEEE, 2019, pp. 2941–2949.
  20. J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi, “Black-box generation of adversarial text sequences to evade deep learning classifiers,” in IEEE, 2018.
  21. P. Michel, X. Li, G. Neubig, and J. M. Pino, “On evaluation of adversarial perturbations for sequence-to-sequence models,” in NAACL-HLT, 2019, pp. 3103–3114.
  22. Y. Hsieh, M. Cheng, D. Juan, W. Wei, W. Hsu, and C. Hsieh, “On the robustness of self-attentive models,” in ACL, 2019.
  23. M. Cheng, J. Yi, P. Chen, H. Zhang, and C. Hsieh, “Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples,” in AAAI, 2020.
  24. R. Maheshwary, S. Maheshwary, and V. Pudi, “A context aware approach for generating natural language attacks,” in AAAI, 2021.
  25. ——, “Generating natural language attacks in a hard label black box setting,” in AAAI, 2021.
  26. D. Li, Y. Zhang, H. Peng, L. Chen, C. Brockett, M. Sun, and B. Dolan, “Contextualized perturbation for textual adversarial attack,” in NAACL-HLT, 2021.
  27. Q. Li, S. Shah, X. Liu, and A. Nourbakhsh, “Data sets: Word embeddings learned from tweets and general data,” in ICWSM, 2017.
  28. Z. Wang and H. Wang, “Defense of word-level adversarial attacks via random substitution encoding,” in KSEM, 2020.
  29. Y. Zhou, X. Zheng, C. Hsieh, K. Chang, and X. Huang, “Defense against adversarial attacks in NLP via dirichlet neighborhood ensemble,” CoRR, vol. abs/2006.11627, 2020.
  30. Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation,” in ICLR, 2018.
  31. X. Wang, H. Jin, and K. He, “Natural language adversarial attacks and defenses in word level,” CoRR, vol. abs/1909.06723, 2019.
  32. V. Malykh, “Robust to noise models in natural language processing tasks,” in ACL, 2019.
  33. E. Jones, R. Jia, A. Raghunathan, and P. Liang, “Robust encodings: A framework for combating adversarial typos,” in ACL, 2020.
  34. J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box adversarial examples for text classification,” in ACL, 2018.
  35. J. Li, S. Ji, T. Du, B. Li, and T. Wang, “Textbugger: Generating adversarial text against real-world applications,” in 26th Annual Network and Distributed System Security Symposium, NDSS.   The Internet Society, 2019.
  36. J. Ebrahimi, D. Lowd, and D. Dou, “On adversarial examples for character-level neural machine translation,” 2018.
  37. Y. Wang and M. Bansal, “Robust machine comprehension models via adversarial training.”   Association for Computational Linguistics, 2018.
  38. Y. Zang, F. Qi, C. Yang, Z. Liu, M. Zhang, Q. Liu, and M. Sun, “Word-level textual adversarial attacking as combinatorial optimization,” in ACL, 2020.
  39. D. Kang, T. Khot, A. Sabharwal, and E. H. Hovy, “Adventure: Adversarial training for textual entailment with knowledge-guided examples,” in ACL, I. Gurevych and Y. Miyao, Eds., 2018.
  40. J. Xu, L. Zhao, H. Yan, Q. Zeng, Y. Liang, and X. Sun, “Lexicalat: Lexical-based adversarial reinforcement training for robust sentiment classification,” in EMNLP-IJCNLP, 2019.
  41. H. Liu, Y. Zhang, Y. Wang, Z. Lin, and Y. Chen, “Joint character-level word embedding and adversarial stability training to defend adversarial text,” in AAAI, 2020.
  42. K. Liu, X. Liu, A. Yang, J. Liu, J. Su, S. Li, and Q. She, “A robust adversarial training approach to machine reading comprehension,” in AAAI, 2020.
  43. T. Miyato, A. M. Dai, and I. J. Goodfellow, “Adversarial training methods for semi-supervised text classification,” in ICLR, 2017.
  44. T. Miyato, S. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: a regularization method for supervised and semi-supervised learning,” CoRR, vol. abs/1704.03976, 2017. [Online]. Available: http://arxiv.org/abs/1704.03976
  45. Y. Cheng, L. Jiang, and W. Macherey, “Robust neural machine translation with doubly adversarial inputs,” in ACL, A. Korhonen, D. R. Traum, and L. Màrquez, Eds., 2019.
  46. C. Zhu, Y. Cheng, Z. Gan, S. Sun, T. Goldstein, and J. Liu, “Freelb: Enhanced adversarial training for natural language understanding,” in ICLR, 2020.
  47. L. Pereira, X. Liu, H. Cheng, H. Poon, J. Gao, and I. Kobayashi, “Targeted adversarial training for natural language understanding,” in NAACL-HLT, 2021.
  48. J. W. Wei and K. Zou, “EDA: easy data augmentation techniques for boosting performance on text classification tasks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds., 2019. [Online]. Available: https://doi.org/10.18653/v1/D19-1670
  49. X. Dai and H. Adel, “An analysis of simple data augmentation for named entity recognition,” in Proceedings of the 28th International Conference on Computational Linguistics.   International Committee on Computational Linguistics, 2020. [Online]. Available: https://aclanthology.org/2020.coling-main.343
  50. G. Yan, Y. Li, S. Zhang, and Z. Chen, “Data augmentation for deep learning of judgment documents,” Z. Cui, J. Pan, S. Zhang, L. Xiao, and J. Yang, Eds.   Springer, 2019. [Online]. Available: https://doi.org/10.1007/978-3-030-36204-1\_19
  51. A. W. Yu, D. Dohan, M. Luong, R. Zhao, K. Chen, M. Norouzi, and Q. V. Le, “Qanet: Combining local convolution with global self-attention for reading comprehension,” 2018. [Online]. Available: https://openreview.net/forum?id=B14TlG-RW
  52. B. Peng, C. Zhu, M. Zeng, and J. Gao, “Data augmentation for spoken language understanding via pretrained language models,” H. Hermansky, H. Cernocký, L. Burget, L. Lamel, O. Scharenborg, and P. Motlícek, Eds.   ISCA, 2021, pp. 1219–1223. [Online]. Available: https://doi.org/10.21437/Interspeech.2021-117
  53. G. Daval-Frerot and Y. Weis, “WMD at semeval-2020 tasks 7 and 11: Assessing humor and propaganda using unsupervised data augmentation,” in Proceedings of the Fourteenth Workshop on Semantic Evaluation, SemEval@COLING 2020, Barcelona (online), December 12-13, 2020, A. Herbelot, X. Zhu, A. Palmer, N. Schneider, J. May, and E. Shutova, Eds.   International Committee for Computational Linguistics, 2020, pp. 1865–1874. [Online]. Available: https://doi.org/10.18653/v1/2020.semeval-1.246
  54. D. Zhang, T. Li, H. Zhang, and B. Yin, “On data augmentation for extreme multi-label classification,” CoRR, vol. abs/2009.10778, 2020.
  55. X. Zuo, Y. Chen, K. Liu, and J. Zhao, “Knowdis: Knowledge enhanced data augmentation for event causality detection via distant supervision,” in COLING.   International Committee on Computational Linguistics, 2020, pp. 1544–1550.
  56. D. Lowell, B. E. Howard, Z. C. Lipton, and B. C. Wallace, “Unsupervised data augmentation with naive augmentation and without unlabeled data,” in EMNLP (1).   Association for Computational Linguistics, 2021, pp. 4992–5001.
  57. C. Rastogi, N. Mofid, and F. Hsiao, “Can we achieve more with less? exploring data augmentation for toxic comment classification,” CoRR, vol. abs/2007.00875, 2020.
  58. T. Nugent, N. Stelea, and J. L. Leidner, “Detecting environmental, social and governance (ESG) topics using domain-specific language models and data augmentation,” in FQAS, ser. Lecture Notes in Computer Science, vol. 12871.   Springer, 2021, pp. 157–169.
  59. T. Kober, J. Weeds, L. Bertolini, and D. J. Weir, “Data augmentation for hypernymy detection,” in EACL.   Association for Computational Linguistics, 2021, pp. 1034–1048.
  60. A. Perevalov and A. Both, “Augmentation-based answer type classification of the SMART dataset,” in SMART@ISWC, ser. CEUR Workshop Proceedings, vol. 2774.   CEUR-WS.org, 2020, pp. 1–9.
  61. B. Tarján, G. Szaszák, T. Fegyó, and P. Mihajlik, “Deep transformer based data augmentation with subword units for morphologically rich online ASR,” CoRR, vol. abs/2007.06949, 2020.
  62. J. Wu, X. Li, X. Ao, Y. Meng, F. Wu, and J. Li, “Improving robustness and generality of NLP models using disentangled representations,” CoRR, vol. abs/2009.09587, 2020.
  63. B. Wang, S. Wang, Y. Cheng, Z. Gan, R. Jia, B. Li, and J. Liu, “Infobert: Improving robustness of language models from an information theoretic perspective,” in ICLR.   OpenReview.net, 2021.
  64. A. H. Li and A. Sethy, “Knowledge enhanced attention for robust natural language inference,” CoRR, vol. abs/1909.00102, 2019.
  65. N. S. Moosavi, M. de Boer, P. A. Utama, and I. Gurevych, “Improving robustness by augmenting training sentences with predicate-argument structures,” CoRR, vol. abs/2010.12510, 2020.
  66. H. Narayanan and S. K. Mitter, “Sample complexity of testing the manifold hypothesis,” in NIPS.   Curran Associates, Inc., 2010, pp. 1786–1794.
  67. Y. Chen, J. Su, and W. Wei, “Multi-granularity textual adversarial attack with behavior cloning,” in EMNLP (1).   Association for Computational Linguistics, 2021, pp. 4511–4526.
  68. P. Gainski and K. Balazy, “Step by step loss goes very far: Multi-step quantization for adversarial text attacks,” in EACL.   Association for Computational Linguistics, 2023, pp. 2030–2040.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Fangyuan Zhang (15 papers)
  2. Huichi Zhou (17 papers)
  3. Shuangjiao Li (1 paper)
  4. Hongtao Wang (40 papers)

Summary

We haven't generated a summary for this paper yet.