Arabic Synonym BERT-based Adversarial Examples for Text Classification (2402.03477v1)
Abstract: Text classification systems have been proven vulnerable to adversarial text examples, modified versions of the original text examples that are often unnoticed by human eyes, yet can force text classification models to alter their classification. Often, research works quantifying the impact of adversarial text attacks have been applied only to models trained in English. In this paper, we introduce the first word-level study of adversarial attacks in Arabic. Specifically, we use a synonym (word-level) attack using a Masked LLMing (MLM) task with a BERT model in a black-box setting to assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. To evaluate the grammatical and semantic similarities of the newly produced adversarial examples using our synonym BERT-based attack, we invite four human evaluators to assess and compare the produced adversarial examples with their original examples. We also study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms against these adversarial examples on the BERT models. We find that fine-tuned BERT models were more susceptible to our synonym attacks than the other Deep Neural Networks (DNN) models like WordCNN and WordLSTM we trained. We also find that fine-tuned BERT models were more susceptible to transferred attacks. We, lastly, find that fine-tuned BERT models successfully regain at least 2% in accuracy after applying adversarial training as an initial defense mechanism.
- Basemah Alshemali and Jugal Kalita. 2019. Adversarial Examples in Arabic. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI), pages 371–376.
- Basemah Alshemali and Jugal Kalita. 2021. Character-level Adversarial Examples in Arabic. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 9–14.
- Generating natural language adversarial examples. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2890–2896, Brussels, Belgium. Association for Computational Linguistics.
- AraBERT: Transformer-based Model for Arabic Language Understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 9–15, Marseille, France. European Language Resource Association.
- On the Robustness of Semantic Segmentation Models to Adversarial Attacks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 888–897.
- An Open Access NLP Dataset for Arabic Dialects: Data Collection, Labeling, and Model Construction. arXiv preprint arXiv:2102.11000.
- Nicholas Carlini and David Wagner. 2018. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. In 2018 IEEE Security and Privacy Workshops (SPW), pages 1–7.
- Adversarial Attacks Against Intrusion Detection Systems: Taxonomy, Solutions and Open Issues. Information Sciences, 239:201–225.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In North American Chapter of the Association for Computational Linguistics.
- HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, Melbourne, Australia. Association for Computational Linguistics.
- Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications. In Intelligent Natural Language Processing: Trends and Applications, pages 35–52, Cham. Springer International Publishing.
- Pathologies of Neural Models Make Interpretations Difficult. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3719–3728, Brussels, Belgium. Association for Computational Linguistics.
- SALSA-TEXT: Self Attentive Latent Space Based Adversarial Text Generation, page 119–131. Springer International Publishing.
- Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56.
- Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: BERT-based Adversarial Examples for Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6174–6181, Online. Association for Computational Linguistics.
- Yuan Gong and Christian Poellabauer. 2017. Crafting Adversarial Examples for Speech Paralinguistics Applications. arXiv preprint arXiv:1711.03280.
- Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572.
- A Survey of Adversarial Defenses and Robustness in NLP. ACM Computing Surveys, 55(14s).
- Word Reading in Arabic: Influences of Diacritics and Ambiguity. In 36th West Coast Conference on Formal Linguistics, pages 176–181. Cascadilla Proceedings Project.
- Deep Speech: Scaling up End-To-End Speech Recognition. arXiv preprint arXiv:1412.5567.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, 9(8):1735–1780.
- Adversarial Machine Learning. In Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, AISec ’11, page 43–58, New York, NY, USA. Association for Computing Machinery.
- The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 92–104, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
- Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8018–8025.
- Deep Learning and Music Adversaries. IEEE Transactions on Multimedia, 17(11):2059–2071.
- Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1746–1751, Doha, Qatar. Association for Computational Linguistics.
- Adversarial Examples for Generative Models. In 2018 ieee security and privacy workshops (spw), pages 36–42. IEEE.
- An Introduction to Adversarial Machine Learning. In Big Data Analytics, pages 293–299, Cham. Springer International Publishing.
- Adversarial Examples in the Physical World. arXiv preprint arXiv:1607.02533.
- Adversarial Machine Learning at Scale. arXiv preprint arXiv:1611.01236.
- Adversarial Machine Learning at Scale. ArXiv, abs/1611.01236.
- Contextualized Perturbation for Textual Adversarial Attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5053–5069, Online. Association for Computational Linguistics.
- TextBugger: Generating Adversarial Text Against Real-world Applications. arXiv preprint arXiv:1812.05271.
- Understanding Neural Networks through Representation Erasure. arXiv preprint arXiv:1612.08220.
- BERT-ATTACK: Adversarial attack against BERT using BERT. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6193–6202, Online. Association for Computational Linguistics.
- Deep Text Classification Can Be Fooled. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization.
- Rensis Likert. 1932. A Technique for the Measurement of Attitudes. Archives of psychology.
- Delving into Transferable Adversarial Examples and Black-box Attacks. arXiv preprint arXiv:1611.02770.
- Flávio Mello. 2020. A Survey on Machine Learning Adversarial Attacks. Journal of Information Security and Cryptography (Enigma), 7(1):1–7.
- Practical Black-Box Attacks against Machine Learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’17, page 506–519, New York, NY, USA. Association for Computing Machinery.
- Crafting Adversarial Input Sequences for Recurrent Neural Networks. In MILCOM 2016-2016 IEEE Military Communications Conference, pages 49–54. IEEE.
- GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar. Association for Computational Linguistics.
- A Taxonomy and Survey of Attacks Against Machine Learning. Computer Science Review, 34:100199.
- Adversarial Attack and Defense Technologies in Natural Language Processing: A Survey. Neurocomputing, 492:278–307.
- Nils Reimers and Iryna Gurevych. 2020. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4512–4525, Online. Association for Computational Linguistics.
- Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1085–1097, Florence, Italy. Association for Computational Linguistics.
- Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4902–4912, Online. Association for Computational Linguistics.
- Adversarial Example Detection by Classification for Deep Speech Recognition. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3102–3106. IEEE.
- Learning from Simulated and Unsupervised Images through Adversarial Training. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2242–2251.
- MPNet: Masked and Permuted Pre-training for Language Understanding. In Advances in Neural Information Processing Systems, volume 33, pages 16857–16867. Curran Associates, Inc.
- Intriguing Properties of Neural Networks. arXiv preprint arXiv:1312.6199.
- Adversarial Examples in Modern Machine Learning: A Review. arXiv preprint arXiv:1911.05268.
- Grey-box Adversarial Attack And Defence For Sentiment Classification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4078–4087, Online. Association for Computational Linguistics.
- Word-level Textual Adversarial Attacking as Combinatorial Optimization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
- Adversarial Attacks and Defenses for Speech Recognition Systems. arXiv preprint arXiv:2103.17122.
- Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Transactions on Intelligent Systems and Technology, 11(3).
- Black-Box Universal Adversarial Attack on Text Classifiers. In 2021 2nd Asia Conference on Computers and Communications (ACCC), pages 1–5.