Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fooling the Textual Fooler via Randomizing Latent Representations (2310.01452v2)

Published 2 Oct 2023 in cs.CL and cs.AI

Abstract: Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the model architecture or model parameters and thus can be detrimental to existing NLP applications. To perform an attack, the adversary queries the victim model many times to determine the most important words in an input text and to replace these words with their corresponding synonyms. In this work, we propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example in these query-based black-box attacks; that is to fool the textual fooler. This defense, named AdvFooler, works by randomizing the latent representation of the input at inference time. Different from existing defenses, AdvFooler does not necessitate additional computational overhead during training nor relies on assumptions about the potential adversarial perturbation set while having a negligible impact on the model's accuracy. Our theoretical and empirical analyses highlight the significance of robustness resulting from confusing the adversary via randomizing the latent space, as well as the impact of randomization on clean accuracy. Finally, we empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks on two benchmark datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp.  2890–2896, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-1316. URL https://aclanthology.org/D18-1316.
  2. Universal sentence encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  169–174, Brussels, Belgium, November 2018. Association for Computational Linguistics. doi: 10.18653/v1/D18-2029. URL https://aclanthology.org/D18-2029.
  3. Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3601–3608, Apr. 2020. doi: 10.1609/aaai.v34i04.5767. URL https://ojs.aaai.org/index.php/AAAI/article/view/5767.
  4. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  5. Towards robustness against natural language word substitutions. ArXiv, abs/2107.13541, 2021.
  6. HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.  31–36, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-2006. URL https://aclanthology.org/P18-2006.
  7. Text processing like humans do: Visually attacking and shielding NLP systems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  1634–1647, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1165. URL https://aclanthology.org/N19-1165.
  8. Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pp.  50–56, 2018a. doi: 10.1109/SPW.2018.00016.
  9. Black-box generation of adversarial text sequences to evade deep learning classifiers. 2018 IEEE Security and Privacy Workshops (SPW), pp.  50–56, 2018b.
  10. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
  11. Achieving verified robustness to symbol substitutions via interval bound propagation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  4083–4093, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1419. URL https://www.aclweb.org/anthology/D19-1419.
  12. Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp.  1875–1885, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1170. URL https://aclanthology.org/N18-1170.
  13. Certified robustness to adversarial word substitutions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  4129–4142, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1423. URL https://aclanthology.org/D19-1423.
  14. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8018–8025, Apr. 2020. doi: 10.1609/aaai.v34i05.6311. URL https://ojs.aaai.org/index.php/AAAI/article/view/6311.
  15. Textbugger: Generating adversarial text against real-world applications. ArXiv, abs/1812.05271, 2018.
  16. Textat: Adversarial training for natural language understanding with token-level perturbation. ArXiv, abs/2004.14543, 2020.
  17. BERT-ATTACK: Adversarial attack against BERT using BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  6193–6202, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.500. URL https://aclanthology.org/2020.emnlp-main.500.
  18. Text adversarial purification as defense against adversarial attacks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  338–350, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.20. URL https://aclanthology.org/2023.acl-long.20.
  19. Exploring the vulnerability of natural language processing models via universal adversarial texts. In Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association, pp.  138–148, Online, December 2021a. Australasian Language Technology Association. URL https://aclanthology.org/2021.alta-1.14.
  20. Searching for an effective defender: Benchmarking defense against adversarial word substitution. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  3137–3147, Online and Punta Cana, Dominican Republic, November 2021b. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.251. URL https://aclanthology.org/2021.emnlp-main.251.
  21. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
  22. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.  142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL https://aclanthology.org/P11-1015.
  23. Towards deep learning models resistant to adversarial attacks. ArXiv, abs/1706.06083, 2017.
  24. Adversarial training methods for semi-supervised text classification. arXiv: Machine Learning, 2016.
  25. TextAttack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  119–126, Online, October 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-demos.16. URL https://aclanthology.org/2020.emnlp-demos.16.
  26. Counter-fitting word vectors to linguistic constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  142–148, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1018. URL https://aclanthology.org/N16-1018.
  27. Textual manifold-based defense against natural language adversarial examples. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  6612–6625, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.443.
  28. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1162. URL https://aclanthology.org/D14-1162.
  29. Mind the style of text! adversarial and backdoor attacks based on text style transfer. ArXiv, abs/2110.07139, 2021.
  30. Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  1085–1097, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1103. URL https://aclanthology.org/P19-1103.
  31. Robustness verification for transformers. ArXiv, abs/2002.06622, 2020a.
  32. Robustness verification for transformers. CoRR, abs/2002.06622, 2020b. URL https://arxiv.org/abs/2002.06622.
  33. Better robustness by more coverage: Adversarial and mixup data augmentation for robust finetuning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp.  1569–1576, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.137. URL https://aclanthology.org/2021.findings-acl.137.
  34. Infobert: Improving robustness of language models from an information theoretic perspective. CoRR, abs/2010.02329, 2020. URL https://arxiv.org/abs/2010.02329.
  35. Measure and improve robustness in NLP models: A survey. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  4569–4586, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.339. URL https://aclanthology.org/2022.naacl-main.339.
  36. Automatic perturbation analysis on general computational graphs. ArXiv, abs/2002.12920, 2020.
  37. SAFER: A structure-free approach for certified robustness to adversarial word substitutions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  3465–3475, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.317. URL https://aclanthology.org/2020.acl-main.317.
  38. On the transferability of adversarial attacks against neural text classifier. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  1612–1625, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.121. URL https://aclanthology.org/2021.emnlp-main.121.
  39. Word-level textual adversarial attacking as combinatorial optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  6066–6080, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.540. URL https://aclanthology.org/2020.acl-main.540.
  40. Certified robustness to text adversarial attacks by randomized [MASK]. CoRR, abs/2105.03743, 2021. URL https://arxiv.org/abs/2105.03743.
  41. Improving the adversarial robustness of NLP models by information bottleneck. In Findings of the Association for Computational Linguistics: ACL 2022, pp.  3588–3598, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.284. URL https://aclanthology.org/2022.findings-acl.284.
  42. Character-level convolutional networks for text classification. In NIPS, 2015.
  43. Defense against synonym substitution-based adversarial attacks via Dirichlet neighborhood ensemble. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  5482–5492, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.426. URL https://aclanthology.org/2021.acl-long.426.
  44. Freelb: Enhanced adversarial training for natural language understanding. In Eighth International Conference on Learning Representations (ICLR), April 2020. URL https://www.microsoft.com/en-us/research/publication/freelb-enhanced-adversarial-training-for-natural-language-understanding/.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Duy C. Hoang (3 papers)
  2. Quang H. Nguyen (8 papers)
  3. Saurav Manchanda (15 papers)
  4. Kok-Seng Wong (16 papers)
  5. Khoa D. Doan (36 papers)
  6. Minlong Peng (18 papers)

Summary

We haven't generated a summary for this paper yet.