A backdoor attack against LSTM-based text classification systems (1905.12457v2)

Published 29 May 2019 in cs.CR

Abstract: With the widespread use of deep learning system in many applications, the adversary has strong incentive to explore vulnerabilities of deep neural networks and manipulate them. Backdoor attacks against deep neural networks have been reported to be a new type of threat. In this attack, the adversary will inject backdoors into the model and then cause the misbehavior of the model through inputs including backdoor triggers. Existed research mainly focuses on backdoor attacks in image classification based on CNN, little attention has been paid to the backdoor attacks in RNN. In this paper, we implement a backdoor attack in text classification based on LSTM by data poisoning. When the backdoor is injected, the model will misclassify any text samples that contains a specific trigger sentence into the target category determined by the adversary. The existence of the backdoor trigger is stealthy and the backdoor injected has little impact on the performance of the model. We consider the backdoor attack in black-box setting where the adversary has no knowledge of model structures or training algorithms except for small amount of training data. We verify the attack through sentiment analysis on the dataset of IMDB movie reviews. The experimental results indicate that our attack can achieve around 95% success rate with 1% poisoning rate.

PDF Abstract

Analysis of Backdoor Attacks in LSTM-Based Text Classification Systems

The research presented by Jiazhu Dai and Chuanshuai Chen explores the exploitation of vulnerabilities in Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) networks, which are predominantly used for natural language processing tasks. This focus on backdoor attacks within LSTM-based systems highlights an area that has received comparatively little attention relative to Convolutional Neural Network (CNN) based systems. The paper showcases a novel backdoor attack strategy using data poisoning, achieving noteworthy results in terms of attack success while preserving the model's performance on clean data.

Key Contributions and Methodology

The paper articulates its main contribution as the implementation of a black-box backdoor attack against LSTM-based text classification systems, targeting a model without requiring knowledge of its architecture or training algorithms. This is achieved through a data poisoning technique where a trigger sentence is randomly inserted into text samples, which are then labeled according to a target category specified by the adversary. These poisoning samples are included in the training dataset, causing the victim model to misclassify test instances containing the trigger sentence into the target class.

Key characteristics of this method include:

Stealth: The insertion positions for the trigger sentence are semantically correct yet varied, making detection challenging.
Efficiency: The attack demonstrates high success rates with minimal impact on the model's classification performance with clean data.
Black-box Setting: The adversary works under constrained conditions, with access to limited training data but not the model structure.

Experimental Results

The paper's experimental evaluation utilized sentiment analysis on the IMDB movie reviews dataset, where the implemented attack achieved a success rate of approximately 95% with just a 1% poisoning rate. This implies that even a small fraction of the dataset can significantly undermine the trustworthiness of the model. The recorded attack efficacy demonstrates not only the vulnerability of LSTM models to this kind of attack but also the practical feasibility of executing such an attack at scale.

Implications and Future Directions

The implications of this research are twofold: practical, in terms of reinforcing concerns about the security of AI systems in real-world applications, and theoretical, as it underscores the necessity for more advanced and robust defense mechanisms against backdoor attacks. The paper suggests that even state-of-the-art models in natural language processing can be subtly manipulated without degrading their usual performance.

Future work, as proposed by the authors, will likely involve enhancing defensive mechanisms against such backdoor attacks and exploring the influence of different trigger sentences on attack effectiveness. The proliferation of AI across various applications underlines the critical need for ongoing research in ensuring the integrity and security of machine learning models.

Overall, this paper contributes to our understanding of model vulnerabilities and the sophisticated nature of adversarial attacks, emphasizing the importance of developing comprehensive security protocols to safeguard AI systems.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Jiazhu Dai (11 papers)
Chuanshuai Chen (2 papers)

Citations (288)

View on Semantic Scholar

A backdoor attack against LSTM-based text classification systems (1905.12457v2)

Analysis of Backdoor Attacks in LSTM-Based Text Classification Systems

Key Contributions and Methodology

Experimental Results

Implications and Future Directions

Related Papers