Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation (1808.10307v1)

Published 30 Aug 2018 in cs.CR, cs.LG, and stat.ML

Abstract: Deep learning models have consistently outperformed traditional machine learning models in various classification tasks, including image classification. As such, they have become increasingly prevalent in many real world applications including those where security is of great concern. Such popularity, however, may attract attackers to exploit the vulnerabilities of the deployed deep learning models and launch attacks against security-sensitive applications. In this paper, we focus on a specific type of data poisoning attack, which we refer to as a {\em backdoor injection attack}. The main goal of the adversary performing such attack is to generate and inject a backdoor into a deep learning model that can be triggered to recognize certain embedded patterns with a target label of the attacker's choice. Additionally, a backdoor injection attack should occur in a stealthy manner, without undermining the efficacy of the victim model. Specifically, we propose two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model. We consider two attack settings, with backdoor injection carried out either before model training or during model updating. We carry out extensive experimental evaluations under various assumptions on the adversary model, and demonstrate that such attacks can be effective and achieve a high attack success rate (above $90\%$) at a small cost of model accuracy loss (below $1\%$) with a small injection rate (around $1\%$), even under the weakest assumption wherein the adversary has no knowledge either of the original training data or the classifier model.

PDF Abstract

Overview of Backdoor Embedding in Convolutional Neural Networks via Invisible Perturbation

This paper introduces a novel type of security threat to deep learning models termed a "backdoor injection attack" that exploits the vulnerability of Convolutional Neural Networks (CNNs). It demonstrates how attackers can embed a backdoor into CNN models via invisible perturbations to induce targeted misclassifications while preserving the model's accuracy on non-poisoned data.

The proposed attack is achieved through two primary methods of generating backdoor perturbations: a patterned static perturbation mask and a targeted adaptive perturbation mask. The intention of both methods is to create perturbations that are imperceptible to human vision while being highly effective in manipulating the decision boundaries of the CNN model.

Methodology and Experimental Setup

The paper's methodology revolves around two distinct phases of backdoor injection: before model training (BIB) and during model updating (BID). The BIB phase involves poisoning the model's training dataset from the outset with a set of crafted backdoor samples, whereas the BID phase involves injecting backdoors during the ongoing model update process, leveraging real-time training adjustments.

Significant attention is devoted to the adversary's model, described by their goals, knowledge, and capability in executing such attacks. The paper explores various adversary conditions, including full, partial, and minimal knowledge scenarios, and evaluates attack success across these gradients.

Experimental validations are conducted using the German Traffic Sign Recognition Benchmark (GTSRB), MNIST, and CIFAR-10 datasets. The substitution of target classes in the GTSRB indicates the feasibility of achieving a high attack success rate (>90%) with minimal accuracy loss (<1%) using around 1% to 1.7% injection rates, depending on the adversary's assumptions regarding knowledge and capability.

Key Findings and Implications

The empirical results indicate the adaptive perturbation approach consistently enables higher attack success rates, generally above 90%, compared to static perturbation, which could yield unpredictable performance due to its pattern complexity and the intricacy of image features.

These findings suggest that CNN-based systems deployed in safety-critical applications, like autonomous driving or facial recognition, remain vulnerable under realistic conditions to adversarial actors employing backdoor perturbations—even with limited knowledge of the system.

Future Developments and Theoretical Implications

The paper opens avenues for further exploration into defense mechanisms capable of identifying and neutralizing such perturbations without detrimental effects on model performance. Potential defenses may involve dynamic anomaly detection layers or neural network feature space analysis to identify discrepancies indicative of backdoor influence.

Additionally, the work draws parallels with adversarial training methodologies and highlights the potential utility of advanced security protocols adapted from image steganography and digital forensics.

In conclusion, the paper contributes significantly to the body of adversarial machine learning literature by expanding on poisoning attacks' breadth and depth, particularly in image classification tasks, and presents insights and challenges encouraging further research in crafting resilient machine learning paradigms against stealthy adversarial threats.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Cong Liao (7 papers)
Haoti Zhong (1 paper)
Anna Squicciarini (17 papers)
Sencun Zhu (19 papers)
David Miller (22 papers)

Citations (291)

View on Semantic Scholar

Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation (1808.10307v1)

Overview of Backdoor Embedding in Convolutional Neural Networks via Invisible Perturbation

Methodology and Experimental Setup

Key Findings and Implications

Future Developments and Theoretical Implications

Related Papers