Test-Time Backdoor Defense via Detecting and Repairing (2308.06107v2)
Abstract: Deep neural networks have played a crucial part in many critical domains, such as autonomous driving, face recognition, and medical diagnosis. However, deep neural networks are facing security threats from backdoor attacks and can be manipulated into attacker-decided behaviors by the backdoor attacker. To defend the backdoor, prior research has focused on using clean data to remove backdoor attacks before model deployment. In this paper, we investigate the possibility of defending against backdoor attacks at test time by utilizing partially poisoned data to remove the backdoor from the model. To address the problem, a two-stage method Test-Time Backdoor Defense (TTBD) is proposed. In the first stage, we propose a backdoor sample detection method DDP to identify poisoned samples from a batch of mixed, partially poisoned samples. Once the poisoned samples are detected, we employ Shapley estimation to calculate the contribution of each neuron's significance in the network, locate the poisoned neurons, and prune them to remove backdoor in the models. Our experiments demonstrate that TTBD removes the backdoor successfully with only a batch of partially poisoned data across different model architectures and datasets against different types of backdoor attacks.
- A new backdoor attack in cnns by training set corruption without label poisoning. In Proc. ICIP, pages 101–105. IEEE, 2019.
- Polynomial calculation of the shapley value based on sampling. Computers & Operations Research, 36(5):1726–1730, 2009.
- Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks. In Proc. IJCAI, 2019.
- Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
- Evaluating the adversarial robustness of adaptive test-time defenses. In Proc. ICML, pages 4421–4435. PMLR, 2022.
- Februus: Input purification defense against trojan attacks on deep neural network systems. In Proc. ACSAC, pages 897–912, 2020.
- Neuron shapley: Discovering the responsible neurons. In Proc. ICLR, 2020.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
- Few-shot backdoor defense using shapley estimation. In Proc. CVPR, pages 13358–13367, 2022.
- Towards inspecting and eliminating trojan backdoors in deep neural networks. In Proc. ICDM, pages 162–171, 2020.
- Identity mappings in deep residual networks. In Proc. ECCV, pages 630–645. Springer, 2016.
- Densely connected convolutional networks. In Proc. CVPR, pages 4700–4708, 2017.
- Distilling cognitive backdoor patterns within an image. In Prco. ICLR, 2022a.
- Backdoor defense via decoupling the training process. In Proc. ICLR, 2022b.
- Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell, 172(5):1122–1131, 2018.
- Learning multiple layers of features from tiny images. 2009.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Test-time detection of backdoor triggers for poisoned deep neural networks. In Proc. ICASSP. IEEE, 2022.
- Invisible backdoor attack with sample-specific triggers. In Proc. ICCV, pages 16463–16472, 2021a.
- Anti-backdoor learning: Training clean models on poisoned data. In Proc. NeurIPS, pages 14900–14912, 2021b.
- Neural attention distillation: Erasing backdoor triggers from deep neural networks. In Proc. ICLR, 2021c.
- Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In Proc. ICML, pages 6028–6039, 2020.
- A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361, 2023.
- Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proc. RAID, pages 273–294, 2018.
- Detecting backdoors during the inference stage based on corruption robustness consistency. In Proc. CVPR, 2023.
- Reflection backdoor: A natural backdoor attack on deep neural networks. In Proc. ECCV, pages 182–199. Springer, 2020.
- Dad: Data-free adversarial defense at test time. In Proc. WACV, pages 3562–3571, 2022.
- Dad++: Improved data-free test time adversarial defense. arXiv preprint arXiv:2309.05132, 2023.
- Wanet–imperceptible warping-based backdoor attack. In Proc. ICLR, 2021.
- Input-aware dynamic backdoor attack. In Proc. NeurIPS, pages 3454–3464, 2020.
- Enhancing adversarial robustness via test-time transformation ensembling. In Proc. ICCV, pages 81–91, 2021.
- Certified robustness to label-flipping attacks via randomized smoothing. In Proc. ICML, 2020.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Certified defenses for data poisoning attacks. In Proc. NeurIPS, 2017.
- Mask and restore: Blind backdoor defense at test time with masked autoencoder. arXiv preprint arXiv:2303.15564, 2023.
- Deepface: Closing the gap to human-level performance in face verification. In Proc. CVPR, pages 1701–1708, 2014.
- Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proc. SEC, pages 303–314, 2018.
- Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proc. SP, pages 707–723. IEEE, 2019.
- Tent: Fully test-time adaptation by entropy minimization. In Proc. ICLR, 2020.
- Backdoorbench: A comprehensive benchmark of backdoor learning. In Proc. NeurIPS, pages 10546–10559, 2022.
- Adversarial neuron pruning purifies backdoored deep models. In Proc. NeurIPS, pages 16913–16925, 2021.
- One-to-n & n-to-one: Two advanced backdoor attacks against deep learning models. IEEE Transactions on Dependable and Secure Computing, 19(3):1562–1578, 2020.
- Rethinking the backdoor attacks’ triggers: A frequency perspective. In Proc. ICCV, pages 16473–16481, 2021.
- Pre-activation distributions expose backdoor neurons. Proc. NeurIPS, pages 18667–18680, 2022.
- Gangsweep: Sweep out neural backdoors by gan. In Proc. ACMMM, pages 3173–3181, 2020.
- Jiyang Guan (7 papers)
- Jian Liang (162 papers)
- Ran He (172 papers)