Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack (2405.16134v2)

Published 25 May 2024 in cs.CV

Abstract: Deep neural networks face persistent challenges in defending against backdoor attacks, leading to an ongoing battle between attacks and defenses. While existing backdoor defense strategies have shown promising performance on reducing attack success rates, can we confidently claim that the backdoor threat has truly been eliminated from the model? To address it, we re-investigate the characteristics of the backdoored models after defense (denoted as defense models). Surprisingly, we find that the original backdoors still exist in defense models derived from existing post-training defense strategies, and the backdoor existence is measured by a novel metric called backdoor existence coefficient. It implies that the backdoors just lie dormant rather than being eliminated. To further verify this finding, we empirically show that these dormant backdoors can be easily re-activated during inference, by manipulating the original trigger with well-designed tiny perturbation using universal adversarial attack. More practically, we extend our backdoor reactivation to black-box scenario, where the defense model can only be queried by the adversary during inference, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. The effectiveness of the proposed methods are verified on both image classification and multimodal contrastive learning (i.e., CLIP) tasks. In conclusion, this work uncovers a critical vulnerability that has never been explored in existing defense strategies, emphasizing the urgency of designing more robust and advanced backdoor defense mechanisms in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Square attack: a query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision, 2020.
  2. Badclip: Trigger-aware prompt learning for backdoor attacks on clip. In CVPR, 2024.
  3. Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 112–123, 2023.
  4. A new backdoor attack in cnns by training set corruption without label poisoning. In International Conference on Image Processing, 2019.
  5. Wild patterns: Ten years after the rise of adversarial machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 2154–2156, 2018.
  6. Poisoning and backdooring contrastive learning. In International Conference on Learning Representations, 2022.
  7. Detecting backdoor attacks on deep neural networks by activation clustering. In Workshop on Artificial Intelligence Safety, 2019.
  8. Rays: A ray searching method for hard-label adversarial attack. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1739–1747, 2020.
  9. Effective backdoor defense by exploiting sensitivity of poisoned samples. In Advances in Neural Information Processing Systems, 2022.
  10. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv e-prints, pages arXiv–1712, 2017.
  11. Linkbreaker: Breaking the backdoor-trigger link in dnns via neurons consistency check. IEEE Transactions on Information Forensics and Security, 17:2000–2014, 2022.
  12. Imagenet: A large-scale hierarchical image database. In Conference on Computer Vision and Pattern Recognition, 2009.
  13. Black-box detection of backdoor attacks with limited information and data. In International Conference on Computer Vision, 2021.
  14. Backdoor defense via adaptively splitting poisoned dataset. In CVPR, 2023.
  15. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7:47230–47244, 2019.
  16. Identity mappings in deep residual networks. In European Conference on Computer Vision, 2016.
  17. Backdoor defense via decoupling the training process. In International Conference on Learning Representations, 2022.
  18. Color backdoor: A robust poisoning attack in color space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8133–8142, 2023.
  19. Facial-recognition algorithms: A literature review. Medicine, Science and the Law, 60(2):131–139, 2020.
  20. Rethinking backdoor attacks. In International Conference on Machine Learning, pages 16216–16236. PMLR, 2023.
  21. Similarity of neural network representations revisited. In International conference on machine learning, pages 3519–3529. PMLR, 2019.
  22. Alex Krizhevsky et al. Learning multiple layers of features from tiny images. 2009.
  23. Adversarial machine learning-industry perspectives. In 2020 IEEE security and privacy workshops (SPW), pages 69–75. IEEE, 2020.
  24. Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 2015.
  25. Anti-backdoor learning: Training clean models on poisoned data. In Conference on Neural Information Processing Systems, 2021.
  26. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In International Conference on Learning Representations, 2021.
  27. Invisible backdoor attack with sample-specific triggers. In International Conference on Computer Vision, 2021.
  28. Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning. arXiv preprint arXiv:2311.12075, 2023.
  29. Computing systems for autonomous driving: State of the art and challenges. IEEE Internet of Things Journal, 8(8):6469–6486, 2020.
  30. Trojaning attack on neural networks. In Network and Distributed System Security Symposium, 2018.
  31. Reflection backdoor: A natural backdoor attack on deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pages 182–199. Springer, 2020.
  32. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
  33. Towards stable backdoor purification through feature shift tuning. Advances in Neural Information Processing Systems, 36, 2024.
  34. Input-aware dynamic backdoor attack. In Conference on Neural Information Processing Systems, 2020.
  35. Wanet - imperceptible warping-based backdoor attack. In International Conference on Learning Representations, 2021.
  36. Revisiting the assumption of latent separability for backdoor defenses. In The eleventh international conference on learning representations, 2022.
  37. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  38. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2556–2565, 2018.
  39. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  40. The german traffic sign recognition benchmark: a multi-class classification competition. In international joint conference on neural networks, 2011.
  41. Spectral signatures in backdoor attacks. Advances in neural information processing systems, 31, 2018.
  42. Dual-key multimodal backdoors for visual question answering. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 15375–15385, 2022.
  43. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Symposium on Security and Privacy, 2019.
  44. Mm-bd: Post-training detection of backdoor attacks with arbitrary backdoor pattern types using a maximum margin statistic. In 2024 IEEE Symposium on Security and Privacy (SP), pages 15–15. IEEE Computer Society, 2023.
  45. Robust backdoor attack with visible, semantic, sample-specific, and compatible triggers. arXiv preprint arXiv:2306.00816, 2023.
  46. Unicorn: A unified backdoor trigger inversion framework. In International Conference on Learning Representations, 2023.
  47. Shared adversarial unlearning: Backdoor mitigation by unlearning shared adversarial examples. Advances in Neural Information Processing Systems, 36, 2024.
  48. Backdoorbench: A comprehensive benchmark of backdoor learning. In Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  49. Defenses in adversarial machine learning: A survey. arXiv preprint arXiv:2312.08890, 2023.
  50. Attacks in adversarial machine learning: A systematic survey from the life-cycle perspective. arXiv preprint arXiv:2302.09457, 2023.
  51. Adversarial neuron pruning purifies backdoored deep models. In Conference on Neural Information Processing Systems, 2021.
  52. Not all prompts are secure: A switchable backdoor attack against pre-trained vision transformers. In CVPR, 2024.
  53. Robust contrastive language-image pretraining against data poisoning and backdoor attacks. Advances in Neural Information Processing Systems, 36, 2024.
  54. Adversarial unlearning of backdoors via implicit hypergradient. In International Conference on Learning Representations, 2022.
  55. Rethinking the backdoor attacks’ triggers: A frequency perspective. In International Conference on Computer Vision, 2021.
  56. Purifier: Plug-and-play backdoor mitigation for pre-trained models via anomaly activation suppression. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4291–4299, 2022.
  57. Defeat: Deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15213–15222, 2022.
  58. Data-free backdoor removal based on channel lipschitzness. In European Conference on Computer Vision, 2022.
  59. Enhancing fine-tuning based backdoor defense with sharpness-aware minimization. In International Conference on Computer Vision, 2023.
  60. Neural polarizer: A lightweight and effective backdoor defense via purifying poisoned features. Advances in Neural Information Processing Systems, 36, 2024.
  61. Vdc: Versatile data cleanser for detecting dirty samples via visual-linguistic inconsistency. In The Twelfth International Conference on Learning Representations, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Mingli Zhu (12 papers)
  2. Siyuan Liang (73 papers)
  3. Baoyuan Wu (107 papers)
Citations (9)