Backdoor Defense via Decoupling the Training Process (2202.03423v1)

Published 5 Feb 2022 in cs.CR, cs.CV, and cs.LG

Abstract: Recent studies have revealed that deep neural networks (DNNs) are vulnerable to backdoor attacks, where attackers embed hidden backdoors in the DNN model by poisoning a few training samples. The attacked model behaves normally on benign samples, whereas its prediction will be maliciously changed when the backdoor is activated. We reveal that poisoned samples tend to cluster together in the feature space of the attacked DNN model, which is mostly due to the end-to-end supervised training paradigm. Inspired by this observation, we propose a novel backdoor defense via decoupling the original end-to-end training process into three stages. Specifically, we first learn the backbone of a DNN model via \emph{self-supervised learning} based on training samples without their labels. The learned backbone will map samples with the same ground-truth label to similar locations in the feature space. Then, we freeze the parameters of the learned backbone and train the remaining fully connected layers via standard training with all (labeled) training samples. Lastly, to further alleviate side-effects of poisoned samples in the second stage, we remove labels of some `low-credible' samples determined based on the learned model and conduct a \emph{semi-supervised fine-tuning} of the whole model. Extensive experiments on multiple benchmark datasets and DNN models verify that the proposed defense is effective in reducing backdoor threats while preserving high accuracy in predicting benign samples. Our code is available at \url{https://github.com/SCLBD/DBD}.

Authors (5)

Kunzhe Huang (7 papers)
Yiming Li (199 papers)
Baoyuan Wu (107 papers)
Zhan Qin (54 papers)
Kui Ren (169 papers)

Citations (164)

View on Semantic Scholar

Summary

Decoupling the Training Process for Backdoor Defense in DNNs

In the paper "Backdoor Defense via Decoupling the Training Process," the authors present a novel approach to mitigating backdoor attacks in deep neural networks (DNNs). Backdoor attacks pose significant security threats as they involve injecting malicious samples into the training set that result in the neural network exhibiting incorrect behaviors when triggered. The approach proposed in this paper leverages a decoupled training process to reduce these risks, focusing on learning robust feature representations through self-supervised learning.

Core Contributions

The research explores backdoor vulnerabilities by studying the clustering behavior of poisoned samples within the feature space of DNNs. The authors demonstrate that these samples form tight clusters due to the end-to-end supervised learning paradigm, facilitating the creation of backdoors. To circumvent this, a three-stage training process is introduced:

Self-Supervised Learning of the Feature Extractor: Initially, a DNN backbone is trained using self-supervised learning devoid of any label information, encouraging sample representation based on intrinsic features rather than labels. This stage ensures that samples with similar intrinsic characteristics map to proximate locations in the feature space, thereby diluting any clustering caused by the poison trigger.
Supervised Training of Classification Layers: The second stage freezes the backbone and fine-tunes the fully connected layers using the labeled dataset. Through this decoupled training, the features learned by the backbone are isolated from label-based learning which could potentially link poisoned features with malicious labels.
Semi-Supervised Fine-Tuning for Robustness: To further refine the network, a semi-supervised fine-tuning step is employed. Here, only samples with high credibility (filtered based on loss values) are used with labels, while the rest are treated as unlabeled to adjust and optimize the feature representations further.

Empirical Validation

The paper provides comprehensive experimental validation across various standard datasets (CIFAR-10, ImageNet) with different architectures such as ResNet, demonstrating the efficacy of the proposed defense. Notably, the decoupled training methodology is shown to maintain high accuracy on benign samples (BA) while effectively reducing the attack success rate (ASR) of backdoor activations to negligible levels (<2%).

Insights and Implications

By decoupling the training process, the proposed method addresses the limitations of end-to-end supervision which can inadvertently facilitate the learning of harmful triggers. The self-supervised learning of the backbone allows the model to learn natural and generalizable features without the bias of label-induced clustering, directly tackling the vulnerability to backdoor triggers. This paradigm can be especially beneficial in scenarios where the training data might be sourced from untrusted third-party resources, a common occurrence in distributed learning settings.

Future Directions

The authors suggest that further developments could explore different self-supervised learning techniques and their impact on robustness against adaptive adversarial strategies. Additionally, expanding the decoupling concept to other machine learning domains and enriching the theoretical understanding of why self-supervised learning helps to decouple poisoned features from their label-induced biases are areas of continued research interest.

In summary, the proposed decoupling strategy represents a promising avenue for enhancing the robustness of DNNs against backdoor threats by altering the training dynamics to emphasize intrinsic data structures over potentially compromised supervised labels.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - SCLBD/DBD (29 stars)