Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models (2407.21316v2)
Abstract: Diffusion models (DMs) are regarded as one of the most advanced generative models today, yet recent studies suggest that they are vulnerable to backdoor attacks, which establish hidden associations between particular input patterns and model behaviors, compromising model integrity by causing undesirable actions with manipulated inputs. This vulnerability poses substantial risks, including reputational damage to model owners and the dissemination of harmful content. To mitigate the threat of backdoor attacks, there have been some investigations on backdoor detection and model repair. However, previous work fails to reliably purify the models backdoored by state-of-the-art attack methods, rendering the field much underexplored. To bridge this gap, we introduce Diff-Cleanse, a novel two-stage backdoor defense framework specifically designed for DMs. The first stage employs a novel trigger inversion technique to reconstruct the trigger and detect the backdoor, and the second stage utilizes a structural pruning method to eliminate the backdoor. We evaluate our framework on hundreds of DMs that are attacked by three existing backdoor attack methods with a wide range of hyperparameter settings. Extensive experiments demonstrate that Diff-Cleanse achieves nearly 100\% detection accuracy and effectively mitigates backdoor impacts, preserving the model's benign performance with minimal compromise. Our code is avaliable at https://github.com/shymuel/diff-cleanse.
- Elijah: Eliminating backdoors injected in diffusion models via distribution shift. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 10847–10855, 2024.
- Trojdiff: Trojan attacks on diffusion models with diverse targets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4035–4044, 2023.
- Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, abs/1712.05526, 2017.
- How to backdoor diffusion models? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4015–4024, 2023.
- Villandiffusion: A unified backdoor attack framework for diffusion models. NeurIPS 2023 Workshop on Backdoors in Deep Learning, 2023.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Structural pruning for diffusion models. Advances in neural information processing systems, 36, 2024.
- Improving end-to-end autonomous driving with synthetic data from latent diffusion models. In First Vision and Language for Autonomous Driving and Robotics Workshop.
- Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7:47230–47244, 2019.
- Ufid: A unified framework for input-level backdoor detection on diffusion models. arXiv preprint arXiv:2404.01101, 2024.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Fastdiff: A fast conditional diffusion model for high-quality speech synthesis. IJCAI2022, pages 4132–4138, 2022.
- Diff-tts: A denoising diffusion model for text-to-speech. Proc. Interspeech 2021, pages 3605–3609, 2021.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
- Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
- Learning multiple layers of features from tiny images. 2009.
- Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022.
- Fine-pruning: Defending against backdooring attacks on deep neural networks. In International symposium on research in attacks, intrusions, and defenses, pages 273–294. Springer, 2018.
- Ppgan: Privacy-preserving generative adversarial network. In 2019 IEEE 25Th international conference on parallel and distributed systems (ICPADS), pages 985–989. IEEE, 2019.
- Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022.
- Vidm: Video implicit diffusion models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9117–9125, 2023.
- A morphology focused diffusion probabilistic model for synthesis of histopathology images. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2000–2009, 2023.
- Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11264–11272, 2019.
- Input-aware dynamic backdoor attack. Advances in Neural Information Processing Systems, 33:3454–3464, 2020.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Generative modeling by estimating gradients of the data distribution. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707–723. IEEE, 2019.
- Diffusion models for medical anomaly detection. In International Conference on Medical image computing and computer-assisted intervention, pages 35–45. Springer, 2022.
- Adversarial neuron pruning purifies backdoored deep models. Advances in Neural Information Processing Systems, 34:16913–16925, 2021.
- One-to-n & n-to-one: Two advanced backdoor attacks against deep learning models. IEEE Transactions on Dependable and Secure Computing, 19(3):1562–1578, 2020.
- Video diffusion models with local-global context guidance. arXiv preprint arXiv:2306.02562, 2023.
- Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
- Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. Advances in Neural Information Processing Systems, 36, 2024.
- Data-free backdoor removal based on channel lipschitzness. In European Conference on Computer Vision, pages 175–191. Springer, 2022.
- Jiang Hao (2 papers)
- Xiao Jin (19 papers)
- Hu Xiaoguang (1 paper)
- Chen Tianyou (1 paper)
- Zhao Jiajia (1 paper)