Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger (2105.12400v2)

Published 26 May 2021 in cs.CL and cs.CR

Abstract: Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversary-specified outputs on the inputs embedded with predesigned triggers but behave properly on normal inputs during inference. As a sort of emergent attack, backdoor attacks in NLP are investigated insufficiently. As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort. In this paper, we propose to use the syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the syntactic trigger-based attack method can achieve comparable attack performance (almost 100% success rate) to the insertion-based methods but possesses much higher invisibility and stronger resistance to defenses. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks. All the code and data of this paper can be obtained at https://github.com/thunlp/HiddenKiller.

PDF Abstract

Overview of "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"

The paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger" by Fanchao Qi et al. investigates a novel approach to executing backdoor attacks on deep neural network (DNN) models utilized in NLP. The researchers introduce a methodology that leverages syntactic structures as covert triggers in textual backdoor attacks, juxtaposed with more commonly studied insertion-based methods that typically introduce detectable content alterations in input data.

Context and Methodology

Backdoor attacks involve introducing a covert "trigger" into a model during training. The specific threat bypasses conventional visibility, functioning normally on untouched inputs while producing adversary-specified outputs when triggered by inputs containing the pre-designed triggers. Prior work primarily focusing on insertion-based triggers—where additional tokens are appended to input data—results in easily detectable deviations from grammatical norms.

The innovative approach in this work opts for syntactic structures as triggers, a far more latent feature than traditional insertion methods. The authors employ a syntactically controlled paraphrase network (SCPN) to create poisoned samples. This neural network manages sentence paraphrasing by enforcing specified syntactic structures. The intent here is to poison a portion of training samples by converting them into paraphrases characterized by pre-defined, uncommon syntactic templates, thus implanting the backdoor trigger.

Key Findings

Experiments were conducted across prominent text classification datasets, attacking models like BiLSTM and BERT. The findings demonstrate that syntactic trigger-based backdoor attacks can achieve nearly a 100% attack success rate while maintaining high clean accuracy: a clear indication of the robustness of these attacks. Additionally, because of the linguistic subtlety retained in syntactic triggers, these poisoned samples present higher invisibility during data inspection than traditional backdoor methods.

The paper also discusses employing common NLP defenses. One such defense, ONION, aims to disrupt backdoor attacks by filtering lexical anomalies in test samples, effectively thwarting traditional insertion triggers but showing limited efficacy against syntactic triggers.

Implications and Future Directions

This research significantly increases awareness about textual backdoor threats and suggests an urgent need for developing sophisticated detection mechanisms in NLP systems. It underscores the necessity for future research to pivot from focusing solely on insertion-based triggers to other aspects like syntactic manipulations which are more stealthy and challenging to detect.

Practically, this work burdens developers to reconsider reliance on third-party datasets and models due to the lurking possibility of undetectable backdoor designs. The authors propose community-wide efforts encompassing trusted third-party endorsements and continuous improvements in sanitization tools that can adapt to and neutralize such emergent threats.

Conclusion

In conclusion, this paper exposes a critical vulnerability in DNNs employed in NLP tasks by leveraging syntactic structures as potential backdoor triggers. It resonates with the broader theme: as models grow increasingly complex, so must our approaches to evaluating and securing them against nuanced, sophisticated threats.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Fanchao Qi (33 papers)
Mukai Li (17 papers)
Yangyi Chen (29 papers)
Zhengyan Zhang (46 papers)
Zhiyuan Liu (433 papers)
Yasheng Wang (91 papers)
Maosong Sun (337 papers)

Citations (193)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - thunlp/HiddenKiller: Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger" (37 stars)