Anti-Backdoor Learning: Training Clean Models on Poisoned Data (2110.11571v3)

Published 22 Oct 2021 in cs.LG and cs.AI

Abstract: Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the \emph{clean} and the \emph{backdoor} portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage \emph{gradient ascent} mechanism for standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at \url{https://github.com/bboylyg/ABL}.

PDF Abstract

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

Backdoor attacks represent a significant security threat to deep neural networks (DNNs), leveraging data poisoning to introduce latent vulnerabilities during the training phase. The paper titled "Anti-Backdoor Learning: Training Clean Models on Poisoned Data" delineates an approach that addresses the core problem of training clean models directly on datasets containing backdoored data without relying on a priori knowledge of the backdoor distribution. The authors propose a pioneering framework, Anti-Backdoor Learning (ABL), which comprises two phases of learning that utilize a two-stage gradient ascent mechanism to tackle backdoor threats.

In reviewing existing literature, traditional defenses against backdoor attacks generally focus on either detection or erasure of backdoors. While these methods have shown promise in identifying backdoors and alleviating their impact once detected, they do not prevent the initial learning of backdoor triggers by the model. The paper therefore posits a formidable yet underexplored question within backdoor defense: can we train a model on poisoned data such that it remains as robust as one trained only on clean data?

Central to ABL is the conceptual framework viewing model training on poisoned data as a dual-task problem. Here, the network learns both clean and backdoor portions simultaneously but the authors identify two weaknesses inherent to backdoor attacks: (1) models acquire backdoored data more swiftly than clean data, correlating the intensity of the attack with the speed of convergence; (2) the backdoor operation ties closely to a specific class label, creating a dependency that can be disrupted.

The ABL framework integrates these insights into its two-stage learning process:

Backdoor Isolation: In the early training stage, ABL employs a local gradient ascent (LGA) technique to differentiate backdoor examples from clean examples by manipulating their loss values. The strategy exploits the tendency of backdoored data to converge quickly by setting a loss threshold, effectively isolating low-loss examples early in the training process.
Backdoor Unlearning: Subsequently, ABL facilitates unlearning of the backdoor correlations in the later training stages. The isolated examples from the previous stage, suspected to be backdoor-generated, undergo a specific treatment where their associated class correlations are challenged using a global gradient ascent mechanism, thereby decoupling the link between backdoors and their target class.

Noteworthy among the empirical results presented in the paper is the demonstration of ABL's efficacy across multiple standard benchmarks (i.e., CIFAR-10, GTSRB, and an ImageNet subset) and 10 state-of-the-art backdoor attacks. ABL models achieve competitive performance with models trained on purely clean datasets, significantly reducing the attack success rates of various sophisticated backdoor scenarios with minimal compromise in clean data accuracy. For instance, with a 1% isolation rate, ABL effectively counters poisoning rates as high as 50%, reducing attack success to negligible figures.

The implications of these findings are multifold. Practically, ABL provides a robust mechanism to enhance model security, particularly in environments where data purity cannot be guaranteed. Theoretically, the insights drawn on the learnability and structural dependency of backdoor attacks contribute to a deeper understanding of adversarial vulnerabilities within neural networks.

Looking forward, future research on anti-backdoor learning could elaborate on adapting ABL with dynamic thresholds, explore its extension to real-world and federated learning frameworks, and improve its defense efficacy against more subtle and adaptive backdoor strategies. Improvements regarding detection precision and a generalized unlearning mechanism could also be worthwhile pursuits.

In conclusion, by leveraging inherent backdoor susceptibilities, Anti-Backdoor Learning marks a significant stride in devising holistic and proactive strategies for secure machine learning deployments amidst increasing adversarial threats.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Yige Li (24 papers)
Xixiang Lyu (6 papers)
Nodens Koren (4 papers)
Lingjuan Lyu (131 papers)
Bo Li (1107 papers)
Xingjun Ma (114 papers)

Citations (275)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - bboylyg/ABL: Anti-Backdoor learning (NeurIPS 2021) (76 stars)