A new Backdoor Attack in CNNs by training set corruption without label poisoning (1902.11237v1)

Published 12 Feb 2019 in cs.CR, cs.CV, and cs.LG

Abstract: Backdoor attacks against CNNs represent a new threat against deep learning systems, due to the possibility of corrupting the training set so to induce an incorrect behaviour at test time. To avoid that the trainer recognises the presence of the corrupted samples, the corruption of the training set must be as stealthy as possible. Previous works have focused on the stealthiness of the perturbation injected into the training samples, however they all assume that the labels of the corrupted samples are also poisoned. This greatly reduces the stealthiness of the attack, since samples whose content does not agree with the label can be identified by visual inspection of the training set or by running a pre-classification step. In this paper we present a new backdoor attack without label poisoning Since the attack works by corrupting only samples of the target class, it has the additional advantage that it does not need to identify beforehand the class of the samples to be attacked at test time. Results obtained on the MNIST digits recognition task and the traffic signs classification task show that backdoor attacks without label poisoning are indeed possible, thus raising a new alarm regarding the use of deep learning in security-critical applications.

PDF Abstract

Analysis of a Novel Backdoor Attack on CNNs by Training Set Corruption without Label Poisoning

The paper "A New Backdoor Attack in CNNs by Training Set Corruption without Label Poisoning" presents an innovative approach to backdoor attacks in convolutional neural networks (CNNs), emphasizing the stealthiness of such attacks in security-critical applications. The authors propose a method that does not require label poisoning, which significantly enhances the covert nature of the attack, as the labels remain consistent with the content, avoiding detection during manual inspection.

Summary

The research introduces a backdoor attack that focuses on corrupting the training samples of a pre-determined target class by adding subtle perturbations—referred to as backdoor signals—without altering the labels. Traditionally, label poisoning has been a common approach, which makes attacks susceptible to detection through label-content mismatch. By eliminating label poisoning, the paper advances the field by enhancing the stealth of the attack.

The empirical investigation uses two well-known classification tasks: MNIST digit recognition and German Traffic Sign Recognition. The proposed backdoor signals are designed to be nearly imperceptible, ensuring the training data maintains its original appearance to a human observer.

Experimental Results

The experiments described in the paper demonstrate the effectiveness of backdoor attacks on CNNs without label modification. The tests on MNIST showed that networks could be influenced to misclassify non-target class samples into the target class when the subtle backdoor signal was present, achieving high attack success rates when appropriate parameters were set. For example, a 30-40% fraction of corrupted samples with appropriately adjusted signal strength exhibited considerable success. A similar methodology was applied to traffic sign classification, albeit with different signal shapes, due to the complexity of the dataset.

Implications and Future Work

This paper raises important security implications for machine learning models, particularly those deployed in critical applications. The results underline the vulnerability of CNNs to new kinds of backdoor attacks that can obfuscate their presence more effectively than previous methods.

The research highlights the need for further studies on detecting such stealthy backdoor attacks and developing robust defenses against them. Future work could delve into optimizing the backdoor signal for different classification tasks to enhance attack success while maintaining sample stealth. Additionally, exploring the balance between the percentage of corrupted samples and attack efficacy could lead to more efficient use of adversarial resources.

In conclusion, this research contributes to the understanding of adversarial machine learning, particularly in the field of backdoor attacks, and emphasizes the necessity for ongoing research to protect neural networks from such sophisticated vulnerabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Mauro Barni (56 papers)
Kassem Kallas (10 papers)
Benedetta Tondi (43 papers)

Citations (309)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos