Enhance Robustness of Language Models Against Variation Attack through Graph Integration (2404.12014v1)
Abstract: The widespread use of pre-trained LLMs (PLMs) in NLP has greatly improved performance outcomes. However, these models' vulnerability to adversarial attacks (e.g., camouflaged hints from drug dealers), particularly in the Chinese language with its rich character diversity/variation and complex structures, hatches vital apprehension. In this study, we propose a novel method, CHinese vAriatioN Graph Enhancement (CHANGE), to increase the robustness of PLMs against character variation attacks in Chinese content. CHANGE presents a novel approach for incorporating a Chinese character variation graph into the PLMs. Through designing different supplementary tasks utilizing the graph structure, CHANGE essentially enhances PLMs' interpretation of adversarially manipulated text. Experiments conducted in a multitude of NLP tasks show that CHANGE outperforms current LLMs in combating against adversarial attacks and serves as a valuable contribution to robust LLM research. These findings contribute to the groundwork on robust LLMs and highlight the substantial potential of graph-guided pre-training strategies for real-world applications.
- Alfred V. Aho and Jeffrey D. Ullman. 1972. The Theory of Parsing, Translation and Compiling, volume 1. Prentice-Hall, Englewood Cliffs, NJ.
- American Psychological Association. 1983. Publications Manual. American Psychological Association, Washington, DC.
- Rie Kubota Ando and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817–1853.
- Galen Andrew and Jianfeng Gao. 2007. Scalable training of l1-regularized log-linear models. In International Conference on Machine Learning.
- Defending pre-trained language models from adversarial word substitution without performance sacrifice. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3248–3258, Online. Association for Computational Linguistics.
- Jacob Buckman and Graham Neubig. 2018. Neural lattice language models. Transactions of the Association for Computational Linguistics, 6:529–541.
- Alternation. Journal of the Association for Computing Machinery, 28(1):114–133.
- Research on the reliability and fairness of opinion retrieval in public topics. 2024 Network and Distributed System Security (NDSS) workshop on AI Systems with Confidential Computing.
- SpellGCN: Incorporating phonological and visual similarities into language models for Chinese spelling check. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 871–881, Online. Association for Computational Linguistics.
- James W. Cooley and John W. Tukey. 1965. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19(90):297–301.
- Revisiting pre-trained models for Chinese natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 657–668, Online. Association for Computational Linguistics.
- ChineseBERT: Chinese pretraining enhanced by glyph and Pinyin information. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2063–2073, Online. Association for Computational Linguistics.
- Pre-training with whole word masking for chinese bert. arXiv preprint arXiv:1906.08101.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- HotFlip: White-box adversarial examples for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 31–36, Melbourne, Australia. Association for Computational Linguistics.
- Siddhant Garg and Goutham Ramakrishnan. 2020. BAE: BERT-based adversarial examples for text classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6174–6181, Online. Association for Computational Linguistics.
- Dan Gusfield. 1997. Algorithms on Strings, Trees and Sequences. Cambridge University Press, Cambridge, UK.
- Detect camouflaged spam content via StoneSkipping: Graph and text joint embedding for Chinese character variation representation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6187–6196, Hong Kong, China. Association for Computational Linguistics.
- Lattice-bert: Leveraging multi-granularity representations in chinese pretrained language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1716–1731.
- ALBERT: A lite BERT for self-supervised learning of language representations. CoRR, abs/1909.11942.
- Enhancing model robustness by incorporating adversarial knowledge into semantic representation. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7708–7712. IEEE.
- Textbugger: Generating adversarial text against real-world applications. CoRR, abs/1812.05271.
- R-drop: Regularized dropout for neural networks. In NeurIPS.
- Order-disorder: Imitation adversarial attacks for black-box neural ranking models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 2025–2039.
- K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2901–2908.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- SSMBA: Self-supervised manifold based data augmentation for improving out-of-domain robustness. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1268–1283, Online. Association for Computational Linguistics.
- Improving language understanding by generative pre-training.
- Mohammad Sadegh Rasooli and Joel R. Tetreault. 2015. Yara parser: A fast and accurate dependency parser. Computing Research Repository, arXiv:1503.06733. Version 2.
- Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1085–1097, Florence, Italy. Association for Computational Linguistics.
- Better robustness by more coverage: Adversarial and mixup data augmentation for robust finetuning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1569–1576.
- Better robustness by more coverage: Adversarial and mixup data augmentation for robust finetuning. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1569–1576, Online. Association for Computational Linguistics.
- Rocbert: Robust chinese bert with multimodal contrastive pretraining. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 921–931.
- How to fine-tune bert for text classification? In Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings 18, pages 194–206. Springer.
- Colake: Contextualized language and knowledge embedding. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3660–3670.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
- A hybrid approach to automatic corpus generation for chinese spelling check. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2517–2527.
- CAT-gen: Improving robustness in NLP models via controlled adversarial text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5141–5146, Online. Association for Computational Linguistics.
- Kepler: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194.
- Prada: Practical black-box adversarial attacks against neural ranking models. ACM Transactions on Information Systems, 41(4):1–27.
- Scalable detection of promotional website defacements in black hat seo campaigns. In USENIX Security Symposium, pages 3703–3720.
- Spelling error correction with soft-masked BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 882–890, Online. Association for Computational Linguistics.
- Ernie: Enhanced language representation with informative entities. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1441–1451.
- Argot: Generating adversarial readable chinese texts. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 2533–2539. International Joint Conferences on Artificial Intelligence Organization. Main track.
- Zi Xiong (3 papers)
- Lizhi Qing (9 papers)
- Yangyang Kang (32 papers)
- Jiawei Liu (156 papers)
- Hongsong Li (5 papers)
- Changlong Sun (37 papers)
- Xiaozhong Liu (71 papers)
- Wei Lu (325 papers)