Defending Neural Backdoors via Generative Distribution Modeling (1910.04749v2)

Published 10 Oct 2019 in cs.LG and stat.ML

Abstract: Neural backdoor attack is emerging as a severe security threat to deep learning, while the capability of existing defense methods is limited, especially for complex backdoor triggers. In the work, we explore the space formed by the pixel values of all possible backdoor triggers. An original trigger used by an attacker to build the backdoored model represents only a point in the space. It then will be generalized into a distribution of valid triggers, all of which can influence the backdoored model. Thus, previous methods that model only one point of the trigger distribution is not sufficient. Getting the entire trigger distribution, e.g., via generative modeling, is a key to effective defense. However, existing generative modeling techniques for image generation are not applicable to the backdoor scenario as the trigger distribution is completely unknown. In this work, we propose max-entropy staircase approximator (MESA), an algorithm for high-dimensional sampling-free generative modeling and use it to recover the trigger distribution. We also develop a defense technique to remove the triggers from the backdoored model. Our experiments on Cifar10/100 dataset demonstrate the effectiveness of MESA in modeling the trigger distribution and the robustness of the proposed defense method.

Citations (171)

View on Semantic Scholar

Summary

The paper presents a novel defense against neural backdoor attacks by modeling the full generative distribution of potential malicious triggers, unlike previous methods focusing on single triggers.
They introduce the max-entropy staircase approximator (MESA), a scalable algorithm for high-dimensional, sampling-free generative modeling of trigger distributions using multiple sub-models.
Experiments show MESA significantly lowers attack success rates from over 92% to below 6% on CIFAR datasets, demonstrating the effectiveness of distribution-based defense against complex triggers.

Defending Neural Backdoors via Generative Distribution Modeling

The paper "Defending Neural Backdoors via Generative Distribution Modeling" addresses a significant security challenge posed by neural backdoor attacks on deep learning models. Neural backdoors exploit the model's training process to insert malicious triggers, resulting in models that misclassify inputs activated by these triggers into an attacker-specified class. This threat is exacerbated by the flexibility of backdoor attacks, which can utilize a variety of inconspicuous triggers. Current defense methods fail to sufficiently counteract these sophisticated attacks, motivating the research in this paper.

The authors present a novel approach by modeling the distribution of potential backdoor triggers rather than focusing on a single point trigger, which is a common limitation of existing methods. They introduce the max-entropy staircase approximator (MESA), a scalable algorithm designed for high-dimensional, sampling-free generative modeling of trigger distributions. MESA avoids direct sampling from the trigger distribution, a challenge due to the distribution being unknown and typically high-dimensional. The algorithm constructs multiple sub-models, each capturing a portion of the trigger space through staircase approximation and entropy maximization. This ensemble aids in recovering the trigger distribution, enabling the formulation of an effective defense technique.

Experimental results demonstrate MESA's capability in accurately modeling trigger distributions on Cifar10 and Cifar100 datasets. The approach significantly enhances defense robustness, effectively lowering attack success rates (ASRs) from over 92.3% to below 5.9% post-defense across various complex triggers, far outperforming traditional methods that rely on single-point trigger reconstructions. These findings underscore the importance of considering the full distribution of valid triggers for robust backdoor defense.

Practically, this research enriches the toolkit for defending against data poisoning and neural backdoors, offering a potentially more reliable method of detecting and mitigating backdoor effects in deployed models without access to the original datasets. Theoretically, it opens new pathways for generative modeling applications in adversarial machine learning, suggesting future work could explore other high-dimensional sampling challenges or alternative forms of adversarial threats.

In conclusion, this paper provides a compelling argument and solution for addressing the limitations of current backdoor defenses. By modeling trigger distributions rather than isolated triggers, it sets a new standard for robustness against neural backdoor attacks. This advancement could influence future developments in AI security, prompting further investigation into distribution-based modeling techniques for varied adversarial scenarios.

Defending Neural Backdoors via Generative Distribution Modeling (1910.04749v2)

Summary

Defending Neural Backdoors via Generative Distribution Modeling

Related Papers