Papers
Topics
Authors
Recent
2000 character limit reached

The Best Defense is Attack: Repairing Semantics in Textual Adversarial Examples (2305.04067v2)

Published 6 May 2023 in cs.CL

Abstract: Recent studies have revealed the vulnerability of pre-trained LLMs to adversarial attacks. Existing adversarial defense techniques attempt to reconstruct adversarial examples within feature or text spaces. However, these methods struggle to effectively repair the semantics in adversarial examples, resulting in unsatisfactory performance and limiting their practical utility. To repair the semantics in adversarial examples, we introduce a novel approach named Reactive Perturbation Defocusing (Rapid). Rapid employs an adversarial detector to identify fake labels of adversarial examples and leverage adversarial attackers to repair the semantics in adversarial examples. Our extensive experimental results conducted on four public datasets, convincingly demonstrate the effectiveness of Rapid in various adversarial attack scenarios. To address the problem of defense performance validation in previous works, we provide a demonstration of adversarial detection and repair based on our work, which can be easily evaluated at https://tinyurl.com/22ercuf8.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Generating natural language adversarial examples. In EMNLP’18: Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896. Association for Computational Linguistics.
  2. Defending pre-trained language models from adversarial word substitution without performance sacrifice. In ACL-IJCNLP’21: Findings of the 2021 Conference of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online Event, August 1-6, 2021, volume ACL-IJCNLP 2021 of Findings of ACL, pages 3248–3258. Association for Computational Linguistics.
  3. Bad characters: Imperceptible NLP attacks. In 43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022, pages 1987–2004. IEEE.
  4. Manifold adversarial augmentation for neural machine translation. In ACL-IJCNLP’21: Findings of the 2021 Conference of the Association for Computational Linguistics, pages 3184–3189. Association for Computational Linguistics.
  5. Robust neural machine translation with doubly adversarial inputs. In ACL’19: Proc. of the 57th Conference of the Association for Computational Linguistics, pages 4324–4333. Association for Computational Linguistics.
  6. BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT’19: Proc. of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, pages 4171–4186. Association for Computational Linguistics.
  7. Towards robustness against natural language word substitutions. In ICLR’21: Proc. of the 9th International Conference on Learning Representations. OpenReview.net.
  8. How should pre-trained language models be fine-tuned towards adversarial robustness? In NeurIPS’21: Proc. of the 2021 Conference on Neural Information Processing Systems, pages 4356–4369.
  9. Hotflip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, pages 31–36. Association for Computational Linguistics.
  10. Black-box generation of adversarial text sequences to evade deep learning classifiers. In SP’18: Proc. of the 2018 IEEE Security and Privacy Workshops, pages 50–56. IEEE Computer Society.
  11. Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: bert-based adversarial examples for text classification. In EMNLP’20: Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 6174–6181. Association for Computational Linguistics.
  12. Improving robustness using generated data. In NeurIPS’21: Advances in Neural Information Processing Systems, pages 4218–4233.
  13. Gradient-based adversarial attacks against text transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 5747–5757. Association for Computational Linguistics.
  14. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. CoRR, abs/2111.09543.
  15. Maor Ivgi and Jonathan Berant. 2021. Achieving model robustness through discrete adversarial training. In EMNLP’21: Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1529–1544. Association for Computational Linguistics.
  16. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In AAAI’20: Proc. of the 34th AAAI Conference on Artificial Intelligence, pages 8018–8025. AAAI Press.
  17. Robust encodings: A framework for combating adversarial typos. In ACL’20: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics Conference, pages 2752–2765. Association for Computational Linguistics.
  18. Bert-defense: A probabilistic model based on BERT to combat cognitively inspired orthographic adversarial attacks. In ACL-IJCNLP’21: Findings of the 2021 Conference of the Association for Computational Linguistics, volume ACL-IJCNLP 2021 of Findings of ACL, pages 1616–1629. Association for Computational Linguistics.
  19. Contextualized perturbation for textual adversarial attack. In NAACL-HLT’21: Proc. of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, pages 5053–5069. Association for Computational Linguistics.
  20. Textbugger: Generating adversarial text against real-world applications. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society.
  21. BERT-ATTACK: adversarial attack against BERT using BERT. In EMNLP’20: Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 6193–6202. Association for Computational Linguistics.
  22. Rebuild and ensemble: Exploring defense against text adversaries. CoRR, abs/2203.14207.
  23. Joint character-level word embedding and adversarial stability training to defend adversarial text. In AAAI’20: Proc. of the 34th AAAI Conference on Artificial Intelligence, pages 8384–8391. AAAI Press.
  24. A robust adversarial training approach to machine reading comprehension. In AAAI’20: Proc. of the Thirty-Fourth AAAI Conference on Artificial Intelligence, pages 8392–8400. AAAI Press.
  25. Adversarial training methods for semi-supervised text classification. In ICLR’17: Proc. of the 5th International Conference on Learning Representations. OpenReview.net.
  26. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16-20, 2020, pages 119–126. Association for Computational Linguistics.
  27. Frequency-guided word substitutions for detecting textual adversarial examples. In EACL’21: Proc. of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pages 171–186. Association for Computational Linguistics.
  28. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  29. Combating adversarial misspellings with robust word recognition. In ACL’19: Proc. of the 57th Conference of the Association for Computational Linguistics, pages 5582–5591. Association for Computational Linguistics.
  30. Generating natural language adversarial examples through probability weighted word saliency. In ACL’19: Proc. of the 57th Conference of the Association for Computational Linguistics, pages 1085–1097. Association for Computational Linguistics.
  31. Decoupling direction and norm for efficient gradient-based L2 adversarial attacks and defenses. In CVPR’19: IEEE Conference on Computer Vision and Pattern Recognition, pages 4322–4330. Computer Vision Foundation / IEEE.
  32. Textshield: Beyond successfully detecting adversarial sentences in text classification. In ICLR’23: The Eleventh International Conference on Learning Representations. OpenReview.net.
  33. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1631–1642. ACL.
  34. Abigail Swenor and Jugal Kalita. 2022. Using random perturbations to mitigate adversarial attacks on sentiment analysis models. CoRR, abs/2202.05758.
  35. Mind your inflections! improving NLP for non-standard englishes with base-inflection encoding. In EMNLP’20: Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 5647–5663. Association for Computational Linguistics.
  36. Superglue: A stickier benchmark for general-purpose language understanding systems. In NeurIPS’19: Advances in Neural Information Processing Systems, pages 3261–3275.
  37. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR’19: 7th International Conference on Learning Representations. OpenReview.net.
  38. T3: tree-autoencoder constrained adversarial text generation for targeted attack. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 6134–6150. Association for Computational Linguistics.
  39. Semattack: Natural textual attacks via different semantic spaces. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 176–205. Association for Computational Linguistics.
  40. Natural language adversarial defense through synonym encoding. In UAI’21: Proc. of the 37th Conference on Uncertainty in Artificial Intelligence, volume 161 of Proceedings of Machine Learning Research, pages 823–833. AUAI Press.
  41. Detecting textual adversarial examples through randomized substitution and vote. In UAI, volume 180 of Proceedings of Machine Learning Research, pages 2056–2065. PMLR.
  42. Adversarial training with fast gradient projection method against synonym substitution based text attacks. In AAAI’21: Proc. of the 35th AAAI Conference on Artificial Intelligence, pages 13997–14005. AAAI Press.
  43. Better diffusion models further improve adversarial training. CoRR, abs/2302.04638.
  44. Towards adversarially robust text classifiers by learning to reweight clean examples. In ACL’22: Findings of the 2022 Conference of the Association for Computational Linguistics, pages 1694–1707. Association for Computational Linguistics.
  45. Exploring and exploiting decision boundary dynamics for adversarial robustness. CoRR, abs/2302.03015.
  46. Greedy attack and gumbel attack: Generating adversarial examples for discrete data. J. Mach. Learn. Res., 21:43:1–43:36.
  47. Robust textual embedding against word-level adversarial attacks. In UAI, volume 180 of Proceedings of Machine Learning Research, pages 2214–2224. PMLR.
  48. Word-level textual adversarial attacking as combinatorial optimization. In ACL’20: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics Conference, pages 6066–6080. Association for Computational Linguistics.
  49. Openattack: An open-source textual adversarial attack toolkit. In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL 2021 - System Demonstrations, Online, August 1-6, 2021, pages 363–371. Association for Computational Linguistics.
  50. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 649–657.
  51. Crafting adversarial examples for neural machine translation. In ACL-IJCNLP’21: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 1967–1977. Association for Computational Linguistics.
  52. Generating natural adversarial examples. In ICLR’18: Proc. of the 6th International Conference on Learning Representations. OpenReview.net.
  53. Learning to discriminate perturbations for blocking adversarial attacks in text classification. In EMNLP-IJCNLP’19: Proc. of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 4903–4912. Association for Computational Linguistics.
  54. Freelb: Enhanced adversarial training for natural language understanding. In ICLR’20: Proc. of the 8th International Conference on Learning Representations. OpenReview.net.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.