Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks (1805.12185v1)

Published 30 May 2018 in cs.CR and cs.LG

Abstract: Deep neural networks (DNNs) provide excellent performance across a wide range of classification tasks, but their training requires high computational resources and is often outsourced to third parties. Recent work has shown that outsourced training introduces the risk that a malicious trainer will return a backdoored DNN that behaves normally on most inputs but causes targeted misclassifications or degrades the accuracy of the network when a trigger known only to the attacker is present. In this paper, we provide the first effective defenses against backdoor attacks on DNNs. We implement three backdoor attacks from prior work and use them to investigate two promising defenses, pruning and fine-tuning. We show that neither, by itself, is sufficient to defend against sophisticated attackers. We then evaluate fine-pruning, a combination of pruning and fine-tuning, and show that it successfully weakens or even eliminates the backdoors, i.e., in some cases reducing the attack success rate to 0% with only a 0.4% drop in accuracy for clean (non-triggering) inputs. Our work provides the first step toward defenses against backdoor attacks in deep neural networks.

Citations (931)

View on Semantic Scholar

Summary

The paper demonstrates that fine-pruning, combining pruning and fine-tuning, effectively mitigates backdoor attacks across multiple DNN applications.
The authors illustrate that pruning removes decoy neurons while fine-tuning recalibrates the model to maintain high accuracy on clean inputs.
Results show significant reductions in backdoor success rates, notably in traffic sign, speech, and face recognition, underlining the method's robust performance.

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

The paper "Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks" by Kang Liu, Brendan Dolan-Gavitt, and Siddharth Garg presents a comprehensive paper on defending against backdoor attacks in Deep Neural Networks (DNNs). The authors investigate the vulnerabilities introduced by outsourced DNN training, where a malicious trainer might embed hidden backdoors in the model. This paper systematically analyzes existing backdoor attacks and presents an effective defense mechanism termed fine-pruning, a combination of pruning and fine-tuning.

Overview of the Problem

DNNs have achieved state-of-the-art performance in various domains, but their training demands significant computational resources, often necessitating third-party outsourcing. This outsourcing opens up opportunities for adversarial interventions. A malicious actor can insert backdoors during the training process, causing the DNN to behave normally on benign inputs but to misclassify when special triggers are present. These maliciously altered models can have disastrous implications, particularly in critical applications like autonomous driving and facial recognition.

Examined Attacks

The authors replicate three prominent backdoor attacks to evaluate potential defenses:

Traffic Sign Recognition Attack: The attack uses a Post-It note as a trigger to misclassify traffic signs.
Speech Recognition Attack: The attack involves embedding noise patterns into audio inputs to misclassify spoken digits.
Face Recognition Attack: The attack employs specific sunglasses as triggers to impersonate a target individual.

The replication of these attacks confirms their efficacy, with the backdoored models showing high accuracy on benign data but near-perfect success on backdoor-triggered data.

Pruning as a Defense

Initially, the paper explores pruning, a technique that removes neurons from the DNN that are inactive on clean inputs. Pruning exploits the observation that backdoor behavior often utilizes the spare capacity of a DNN. The authors find that pruning can indeed reduce the backdoor success rate, but it fails when confronted with sophisticated attackers. Specifically, a pruning-aware attacker can adapt by ensuring that clean and backdoor activations overlap, making the defense less effective.

Fine-Pruning Defense

Given the limitations of pruning alone, the authors propose fine-pruning, which combines pruning with fine-tuning. Fine-tuning slightly retrains the pruned model using clean data. This two-step approach works synergistically: pruning removes decoy neurons that do not contribute to clean data classification, while fine-tuning updates the remaining weights to correct the network's performance on clean data, mitigating backdoors. Fine-pruning is shown to be highly effective across all tested attacks.

Results and Implications

Experimental results demonstrate that fine-pruning can significantly diminish backdoor effectiveness while maintaining high accuracy on clean data:

For face recognition, the backdoor success rate dropped from 100% to 0% with fine-pruning, with only a slight drop in accuracy.
For speech recognition, fine-pruning reduced the backdoor success rate to 2% from 77%.
In traffic sign recognition, fine-pruning cut the backdoor success rate from 99% to 29%.

These results highlight fine-pruning as a robust and computationally feasible defense against sophisticated backdoor attacks. They emphasize the importance of blending multiple defensive strategies to counter the evolving threats in DNNs.

Speculation on Future Developments

The paper's insights into the intricate dynamics of neuron activations in the presence of backdoors provide a promising direction for future research. One area of interest could be the application of these findings to other types of neural architectures, such as RNNs and LSTMs, which are prevalent in natural language processing tasks.

Additionally, exploring the theoretical underpinnings of adversarial retraining and noise-based defenses could yield more generalized and theoretically grounded defense mechanisms. The notion that adversarial contexts can redefine local minimum landscapes opens up potential lines of inquiry into optimization-based defenses.

Conclusion

The paper "Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks" makes a significant contribution to the field of AI security by presenting a novel and effective defense mechanism against backdoor attacks. Fine-pruning, as a hybrid strategy, leverages the strengths of both pruning and fine-tuning, culminating in a defense that is both practical and robust. This work not only provides immediate defenses for critical applications but also sets the stage for future advancements in securing DNNs against emerging threats.

PDF Markdown

Related Papers

Tweets

https://twitter.com/briandcolwell/status/1910130816816144385