INK: Inheritable Natural Backdoor Attack Against Model Distillation (2304.10985v3)
Abstract: Deep learning models are vulnerable to backdoor attacks, where attackers inject malicious behavior through data poisoning and later exploit triggers to manipulate deployed models. To improve the stealth and effectiveness of backdoors, prior studies have introduced various imperceptible attack methods targeting both defense mechanisms and manual inspection. However, all poisoning-based attacks still rely on privileged access to the training dataset. Consequently, model distillation using a trusted dataset has emerged as an effective defense against these attacks. To bridge this gap, we introduce INK, an inheritable natural backdoor attack that targets model distillation. The key insight behind INK is the use of naturally occurring statistical features in all datasets, allowing attackers to leverage them as backdoor triggers without direct access to the training data. Specifically, INK employs image variance as a backdoor trigger and enables both clean-image and clean-label attacks by manipulating the labels and image variance in an unauthenticated dataset. Once the backdoor is embedded, it transfers from the teacher model to the student model, even when defenders use a trusted dataset for distillation. Theoretical analysis and experimental results demonstrate the robustness of INK against transformation-based, search-based, and distillation-based defenses. For instance, INK maintains an attack success rate of over 98\% post-distillation, compared to an average success rate of 1.4\% for existing methods.
- L. Li, K. Ota, and M. Dong, “Deep learning for smart industry: Efficient manufacture inspection system with fog computing,” IEEE Transactions on Industrial Informatics, vol. 14, no. 10, pp. 4665–4673, 2018.
- M. S. Hossain, M. Al-Hammadi, and G. Muhammad, “Automatic fruit classification using deep learning for industrial applications,” IEEE transactions on industrial informatics, vol. 15, no. 2, pp. 1027–1034, 2018.
- X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The apolloscape open dataset for autonomous driving and its application,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2702–2719, 2019.
- L. Wang, D. Li, H. Liu, J. Peng, L. Tian, and Y. Shan, “Cross-dataset collaborative learning for semantic segmentation in autonomous driving,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 2487–2494.
- B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks against machine learning at test time,” in Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III, ser. Lecture Notes in Computer Science, H. Blockeel, K. Kersting, S. Nijssen, and F. Zelezný, Eds., vol. 8190. Springer, 2013, pp. 387–402.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2014.
- R. Ning, J. Li, C. Xin, H. Wu, and C. Wang, “Hibernated backdoor: A mutual information empowered backdoor attack to deep neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 9, 2022, pp. 10 309–10 318.
- P. Lv, C. Yue, R. Liang, Y. Yang, S. Zhang, H. Ma, and K. Chen, “A data-free backdoor injection approach in neural networks,” in 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 2671–2688.
- S. An, G. Tao, Q. Xu, Y. Liu, G. Shen, Y. Yao, J. Xu, and X. Zhang, “Mirror: Model inversion for deep learning network with high fidelity,” in Proceedings of the 29th Network and Distributed System Security Symposium, 2022.
- L. Struppek, D. Hintersdorf, A. D. A. Correira, A. Adler, and K. Kersting, “Plug & play attacks: Towards robust and flexible model inversion attacks,” in International Conference on Machine Learning. PMLR, 2022, pp. 20 522–20 545.
- T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019.
- X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,” CoRR, vol. abs/1712.05526, 2017.
- A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks on neural networks,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, 2018, pp. 6106–6116.
- T. A. Nguyen and A. T. Tran, “Wanet - imperceptible warping-based backdoor attack,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
- K. Doan, Y. Lao, W. Zhao, and P. Li, “LIRA: learnable, imperceptible and robust backdoor attacks,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV, 2021, pp. 11 946–11 956.
- A. Turner, D. Tsipras, and A. Madry, “Label-consistent backdoor attacks,” stat, vol. 1050, p. 6, 2019.
- C. Zhu, W. R. Huang, H. Li, G. Taylor, C. Studer, and T. Goldstein, “Transferable clean-label poisoning attacks on deep neural nets,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser. Proceedings of Machine Learning Research, vol. 97, 2019, pp. 7614–7623.
- K. Chen, X. Lou, G. Xu, J. Li, and T. Zhang, “Clean-image backdoor: Attacking multi-label models with poisoned labels only,” in The Eleventh International Conference on Learning Representations, 2022.
- E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], ser. Proceedings of Machine Learning Research, vol. 108, 2020, pp. 2938–2948.
- Y. Ge, Q. Wang, B. Zheng, X. Zhuang, Q. Li, C. Shen, and C. Wang, “Anti-distillation backdoor attacks: Backdoors can really survive in knowledge distillation,” in MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, 2021, pp. 826–834.
- E. Bagdasaryan and V. Shmatikov, “Blind backdoors in deep learning models,” in 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, 2021, pp. 1505–1521.
- J. Dumford and W. J. Scheirer, “Backdooring convolutional neural networks via targeted weight perturbations,” in 2020 IEEE International Joint Conference on Biometrics, IJCB 2020, Houston, TX, USA, September 28 - October 1, 2020, 2020, pp. 1–9.
- B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, 2018, pp. 8011–8021.
- B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. M. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” in Workshop on Artificial Intelligence Safety 2019 co-located with the Thirty-Third AAAI Conference on Artificial Intelligence 2019 (AAAI-19), Honolulu, Hawaii, January 27, 2019, ser. CEUR Workshop Proceedings, vol. 2301, 2019.
- H. Qiu, Y. Zeng, S. Guo, T. Zhang, M. Qiu, and B. Thuraisingham, “Deepsweep: An evaluation framework for mitigating DNN backdoor attacks using data augmentation,” in ASIA CCS ’21: ACM Asia Conference on Computer and Communications Security, Virtual Event, Hong Kong, June 7-11, 2021, 2021, pp. 363–377.
- K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” in Research in Attacks, Intrusions, and Defenses - 21st International Symposium, RAID 2018, Heraklion, Crete, Greece, September 10-12, 2018, Proceedings, ser. Lecture Notes in Computer Science, vol. 11050, 2018, pp. 273–294.
- B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, May 19-23, 2019, 2019, pp. 707–723.
- Y. Xu, X. Liu, K. Ding, and B. Xin, “Ibd: An interpretable backdoor-detection method via multivariate interactions,” Sensors, vol. 22, no. 22, p. 8697, 2022.
- M. Shanker, M. Y. Hu, and M. S. Hung, “Effect of data standardization on neural network training,” Omega, vol. 24, no. 4, pp. 385–397, 1996.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
- R. M. Dudley, “Central limit theorems for empirical measures,” The Annals of Probability, pp. 899–929, 1978.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Master’s thesis, University of Tront, 2009.
- J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “The german traffic sign recognition benchmark: A multi-class classification competition,” in The 2011 International Joint Conference on Neural Networks, IJCNN 2011, San Jose, California, USA, July 31 - August 5, 2011, 2011, pp. 1453–1460.
- L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “CINIC-10 is not imagenet or CIFAR-10,” CoRR, vol. abs/1810.03505, 2018.
- K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, ser. Lecture Notes in Computer Science, vol. 9908, 2016, pp. 630–645.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 2015, pp. 1–9.
- H. Aghakhani, D. Meng, Y. Wang, C. Kruegel, and G. Vigna, “Bullseye polytope: A scalable clean-label poisoning attack with improved transferability,” in IEEE European Symposium on Security and Privacy, EuroS&P 2021, Vienna, Austria, September 6-10, 2021, 2021, pp. 159–178.