Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks (2312.10657v1)

Published 17 Dec 2023 in cs.CR

Abstract: Backdoor attacks are emerging threats to deep neural networks, which typically embed malicious behaviors into a victim model by injecting poisoned samples. Adversaries can activate the injected backdoor during inference by presenting the trigger on input images. Prior defensive methods have achieved remarkable success in countering dirty-label backdoor attacks where the labels of poisoned samples are often mislabeled. However, these approaches do not work for a recent new type of backdoor -- clean-label backdoor attacks that imperceptibly modify poisoned data and hold consistent labels. More complex and powerful algorithms are demanded to defend against such stealthy attacks. In this paper, we propose UltraClean, a general framework that simplifies the identification of poisoned samples and defends against both dirty-label and clean-label backdoor attacks. Given the fact that backdoor triggers introduce adversarial noise that intensifies in feed-forward propagation, UltraClean first generates two variants of training samples using off-the-shelf denoising functions. It then measures the susceptibility of training samples leveraging the error amplification effect in DNNs, which dilates the noise difference between the original image and denoised variants. Lastly, it filters out poisoned samples based on the susceptibility to thwart the backdoor implantation. Despite its simplicity, UltraClean achieves a superior detection rate across various datasets and significantly reduces the backdoor attack success rate while maintaining a decent model accuracy on clean data, outperforming existing defensive methods by a large margin. Code is available at https://github.com/bxz9200/UltraClean.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015.
  2. A new backdoor attack in CNNS by training set corruption without label poisoning. In 2019 IEEE International Conference on Image Processing, ICIP, pages 101–105. IEEE, 2019.
  3. Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning, ICML. icml.cc / Omnipress, 2012.
  4. End to end learning for self-driving cars. CoRR, abs/1604.07316, 2016.
  5. Language models are few-shot learners. In Neural Information Processing Systems NeurIPS, 2020.
  6. A non-local algorithm for image denoising. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR, pages 60–65. IEEE Computer Society, 2005.
  7. Detecting backdoor attacks on deep neural networks by activation clustering. In Workshop on Artificial Intelligence Safety 2019 co-located with the Thirty-Third AAAI Conference on Artificial Intelligence AAAI, 2019.
  8. Deepinspect: A black-box trojan detection and mitigation framework for deep neural networks. In IJCAI, pages 4658–4664, 2019.
  9. Effective backdoor defense by exploiting sensitivity of poisoned samples. Advances in Neural Information Processing Systems, 35:9727–9737, 2022.
  10. Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, abs/1712.05526, 2017.
  11. Deep feature space trojan attack of neural networks by controlled detoxification. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, pages 1148–1156. AAAI Press, 2021.
  12. Sentinet: Detecting localized universal attacks against deep learning systems. In 2020 IEEE Security and Privacy Workshops, SP Workshops, pages 48–54. IEEE, 2020.
  13. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  14. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019.
  15. Februus: Input purification defense against trojan attacks on deep neural network systems. In ACSAC ’20: Annual Computer Security Applications Conference, pages 897–912. ACM, 2020.
  16. Backdoor attack with imperceptible input and latent modification. In Neural Information Processing Systems 2021 (NeurIPS), pages 18944–18957, 2021.
  17. LIRA: learnable, imperceptible and robust backdoor attacks. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 11946–11956. IEEE, 2021.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR, 2021.
  19. Robust anomaly detection and backdoor attack detection via differential privacy. In 8th International Conference on Learning Representations, ICLR, 2020.
  20. Adversarial neuron pruning purifies backdoored deep models.
  21. Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch.
  22. Anti-backdoor learning: Training clean models on poisoned data.
  23. Backdoor defense via adaptively splitting poisoned dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4005–4014, 2023.
  24. STRIP: a defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC, pages 113–125. ACM, 2019.
  25. Witches’ brew: Industrial scale data poisoning via gradient matching. In 9th International Conference on Learning Representations, ICLR, 2021.
  26. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. CoRR, abs/2012.10544, 2020.
  27. Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR, 2015.
  28. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
  29. Towards inspecting and eliminating trojan backdoors in deep neural networks. In 20th IEEE International Conference on Data Mining, ICDM, pages 162–171. IEEE, 2020.
  30. SPECTRE: defending against backdoor attacks using robust statistics. CoRR, abs/2104.11315, 2021.
  31. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  32. On the effectiveness of mitigating data poisoning attacks with gradient shaping. CoRR, abs/2002.11497, 2020.
  33. Backdoor defense via decoupling the training process. CoRR, abs/2202.03423, 2022.
  34. One-pixel signature: Characterizing CNN models for backdoor detection. In Computer Vision - ECCV 2020 - 16th European Conference, Proceedings, Part XXVII, volume 12372 of Lecture Notes in Computer Science, pages 326–341. Springer, 2020.
  35. Neuroninspect: Detecting backdoors in neural networks via output explanations. CoRR, abs/1911.07399, 2019.
  36. Cleann: Accelerated trojan shield for embedded neural networks. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD), pages 1–9. IEEE, 2020.
  37. A unified framework for analyzing and detecting malicious examples of dnn models. arXiv preprint arXiv:2006.14871, 2020.
  38. Universal litmus patterns: Revealing backdoor attacks in cnns. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pages 298–307. Computer Vision Foundation / IEEE, 2020.
  39. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secur. Comput., 18(5):2088–2105, 2021.
  40. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16463–16472, 2021.
  41. Anti-backdoor learning: Training clean models on poisoned data. CoRR, abs/2110.11571, 2021.
  42. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In 9th International Conference on Learning Representations, ICLR, 2021.
  43. Backdoor learning: A survey. CoRR, abs/2007.08745, 2020.
  44. Defensive quantization: When efficiency meets robustness. In 7th International Conference on Learning Representations, ICLR, 2019.
  45. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Research in Attacks, Intrusions, and Defenses - 21st International Symposium, RAID Proceedings, volume 11050 of Lecture Notes in Computer Science, pages 273–294. Springer, 2018.
  46. Trojaning attack on neural networks. In 25th Annual Network and Distributed System Security Symposium, NDSS. The Internet Society, 2018.
  47. Reflection backdoor: A natural backdoor attack on deep neural networks. In ECCV 2020 - 16th European Conference, volume 12355, pages 182–199. Springer, 2020.
  48. Neural trojans. In 2017 IEEE International Conference on Computer Design, ICCD, pages 45–48, 2017.
  49. Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 86–94. IEEE Computer Society, 2017.
  50. Input-aware dynamic backdoor attack. In Neural Information Processing Systems 2020 (NeurIPS), 2020.
  51. Wanet - imperceptible warping-based backdoor attack. In 9th International Conference on Learning Representations, ICLR, 2021.
  52. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019.
  53. Label sanitization against label flipping poisoning attacks. CoRR, abs/1803.00992, 2018.
  54. Defending neural backdoors via generative distribution modeling. In Advances in Neural Information Processing Systems 32, NeurIPS, pages 14004–14013, 2019.
  55. Deepsweep: An evaluation framework for mitigating dnn backdoor attacks using data augmentation. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pages 363–377, 2021.
  56. Backdooring and poisoning neural networks with image-scaling attacks. In 2020 IEEE Security and Privacy Workshops, SP Workshops, pages 41–47. IEEE, 2020.
  57. Hidden trigger backdoor attacks. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, pages 11957–11965. AAAI Press, 2020.
  58. Mastering the game of go with deep neural networks and tree search. Nat., 529(7587):484–489, 2016.
  59. The german traffic sign recognition benchmark: a multi-class classification competition. In The 2011 international joint conference on neural networks, pages 1453–1460. IEEE, 2011.
  60. Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection. In 30th USENIX Security Symposium, USENIX Security, pages 1541–1558. USENIX Association, 2021.
  61. Spectral signatures in backdoor attacks. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems NeurIPS, pages 8011–8021, 2018.
  62. Label-consistent backdoor attacks. CoRR, abs/1912.02771, 2019.
  63. Model agnostic defence against backdoor attacks in machine learning. arXiv preprint arXiv:1908.02203, 2019.
  64. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707–723, 2019.
  65. Deep face recognition: A survey. Neurocomputing, 429:215–244, 2021.
  66. Practical detection of trojan neural networks: Data-limited and data-free cases. In Computer Vision - ECCV 2020 - 16th European Conference, Proceedings, Part XXIII, volume 12368 of Lecture Notes in Computer Science, pages 222–238. Springer, 2020.
  67. Feature denoising for improving adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 501–509. Computer Vision Foundation / IEEE, 2019.
  68. Feature squeezing: Detecting adversarial examples in deep neural networks. In 25th Annual Network and Distributed System Security Symposium, NDSS. The Internet Society, 2018.
  69. Detecting AI trojans using meta neural analysis. In 42nd IEEE Symposium on Security and Privacy, SP, pages 103–120. IEEE, 2021.
  70. Rethinking the backdoor attacks’ triggers: A frequency perspective. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV, pages 16453–16461, 2021.
  71. Bridging mode connectivity in loss landscapes and adversarial robustness. In 8th International Conference on Learning Representations, ICLR, 2020.
  72. Backdoor embedding in convolutional neural network models via invisible perturbation. In CODASPY ’20: Tenth ACM Conference on Data and Application Security and Privacy, pages 97–108. ACM, 2020.
  73. Gangsweep: Sweep out neural backdoors by GAN. In MM ’20: The 28th ACM International Conference on Multimedia,, pages 3173–3181. ACM, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Bingyin Zhao (5 papers)
  2. Yingjie Lao (22 papers)
Citations (1)