LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning (2404.12852v1)
Abstract: Deep neural networks are vulnerable to backdoor attacks. Among the existing backdoor defense methods, trigger reverse engineering based approaches, which reconstruct the backdoor triggers via optimizations, are the most versatile and effective ones compared to other types of methods. In this paper, we summarize and construct a generic paradigm for the typical trigger reverse engineering process. Based on this paradigm, we propose a new perspective to defeat trigger reverse engineering by manipulating the classification confidence of backdoor samples. To determine the specific modifications of classification confidence, we propose a compensatory model to compute the lower bound of the modification. With proper modifications, the backdoor attack can easily bypass the trigger reverse engineering based methods. To achieve this objective, we propose a Label Smoothing Poisoning (LSP) framework, which leverages label smoothing to specifically manipulate the classification confidences of backdoor samples. Extensive experiments demonstrate that the proposed work can defeat the state-of-the-art trigger reverse engineering based methods, and possess good compatibility with a variety of existing backdoor attacks.
- Targeted attack against deep neural networks via flipping limited weight bits. In 9th International Conference on Learning Representations, 2021.
- Adversarial patch. CoRR, abs/1712.09665, 2017.
- Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, abs/1712.05526, 2017.
- Detecting backdoor attacks on deep neural networks by activation clustering. In In proceedings of the Workshop on Artificial Intelligence Safety, 2019.
- Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7:47230–47244, 2019.
- Tien Ho-Phuoc. CIFAR10 to compare visual recognition performance between deep neural networks and humans. CoRR, abs/1811.07270, 2018.
- Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Backdoor defense via decoupling the training process. In The Tenth International Conference on Learning Representations, 2022.
- Color backdoor: A robust poisoning attack in color space. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
- Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Composite backdoor attack for deep neural network by mixing existing benign features. In CCS ’20: 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020.
- Fine-pruning: Defending against backdooring attacks on deep neural networks. In Research in Attacks, Intrusions, and Defenses, 2018.
- Trojaning attack on neural networks. In 25th Annual Network and Distributed System Security Symposium, 2018.
- ABS: scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019.
- Complex backdoor detection by symmetric feature differencing. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- A 3d gan for improved large-pose facial recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- Input-aware dynamic backdoor attack. In Advances in Neural Information Processing Systems 33, 2020.
- Wanet - imperceptible warping-based backdoor attack. In 9th International Conference on Learning Representations, 2021.
- Deepsweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation. In ASIA CCS ’21: ACM Asia Conference on Computer and Communications Security, 2021.
- Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32:323–332, 2012.
- Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Label-consistent backdoor attacks. CoRR, abs/1912.02771, 2019.
- Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004.
- Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy, 2019.
- Backdoor attacks against deep learning systems in the physical world. In IEEE Conference on Computer Vision and Pattern Recognition, 2021.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017.
- DEFEAT: deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Beichen Li (8 papers)
- Yuanfang Guo (19 papers)
- Heqi Peng (4 papers)
- Yangxi Li (7 papers)
- Yunhong Wang (115 papers)