Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder (2403.03846v1)

Published 6 Mar 2024 in cs.LG

Abstract: In this paper, we study a defense against poisoned encoders in SSL called distillation, which is a defense used in supervised learning originally. Distillation aims to distill knowledge from a given model (a.k.a the teacher net) and transfer it to another (a.k.a the student net). Now, we use it to distill benign knowledge from poisoned pre-trained encoders and transfer it to a new encoder, resulting in a clean pre-trained encoder. In particular, we conduct an empirical study on the effectiveness and performance of distillation against poisoned encoders. Using two state-of-the-art backdoor attacks against pre-trained image encoders and four commonly used image classification datasets, our experimental results show that distillation can reduce attack success rate from 80.87% to 27.51% while suffering a 6.35% loss in accuracy. Moreover, we investigate the impact of three core components of distillation on performance: teacher net, student net, and distillation loss. By comparing 4 different teacher nets, 3 student nets, and 6 distillation losses, we find that fine-tuned teacher nets, warm-up-training-based student nets, and attention-based distillation loss perform best, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event.   PMLR, 2020, pp. 1597–1607.
  2. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. B. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the 35th conference on computer vision and pattern recognition.   IEEE, 2022, pp. 15 979–15 988.
  3. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the 24th international conference on computer vision, 2021, pp. 9650–9660.
  4. Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019.
  5. P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18 661–18 673, 2020.
  6. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems, vol. 33, pp. 12 449–12 460, 2020.
  7. R. Balestriero, M. Ibrahim, V. Sobal, A. Morcos, S. Shekhar, T. Goldstein, F. Bordes, A. Bardes, G. Mialon, Y. Tian et al., “A cookbook of self-supervised learning,” arXiv preprint arXiv:2304.12210, 2023.
  8. Y. Wang, N. A. A. Braham, Z. Xiong, C. Liu, C. M. Albrecht, and X. X. Zhu, “Ssl4eo-s12: A large-scale multi-modal, multi-temporal dataset for self-supervised learning in earth observation,” arXiv preprint arXiv:2211.07044, 2022.
  9. A. Jaiswal, A. R. Babu, M. Z. Zadeh, D. Banerjee, and F. Makedon, “A survey on contrastive self-supervised learning,” Technologies, vol. 9, no. 1, p. 2, 2020.
  10. J. Jia, Y. Liu, and N. Z. Gong, “Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning,” in Proceedings of the 43rd IEEE Symposium on Security and Privacy.   San Francisco, CA, USA: IEEE, May 22-26 2022, pp. 2043–2059.
  11. A. Saha, A. Tejankar, S. A. Koohpayegani, and H. Pirsiavash, “Backdoor attacks on self-supervised learning,” in Proceedings of the 32nd Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 337–13 346.
  12. G. Tao, Y. Liu, G. Shen, Q. Xu, S. An, Z. Zhang, and X. Zhang, “Model orthogonalization: Class distance hardening in neural networks for better security,” in 43rd IEEE Symposium on Security and Privacy (SP).   IEEE, 2022.
  13. D. Wu and Y. Wang, “Adversarial neuron pruning purifies backdoored deep models,” in Proceedings of the 35th Annual Conference on Neural Information Processing Systems.   virtual: Curran Associates, December 6-14 2021, pp. 16 913–16 925.
  14. K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” in International symposium on research in attacks, intrusions, and defenses.   Springer, 2018, pp. 273–294.
  15. Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Neural attention distillation: Erasing backdoor triggers from deep neural networks,” in Proceedings of the 9th International Conference on Learning Representations.   Virtual Event, Austria: OpenReview.net, May 3-7 2021, pp. 1–12.
  16. L. Chen, D. Wang, Z. Gan, J. Liu, R. Henao, and L. Carin, “Wasserstein contrastive representation distillation,” in Proceedings of the 44th conference on computer vision and pattern recognition, 2021, pp. 16 296–16 305.
  17. G. Xu, Z. Liu, and C. C. Loy, “Computation-efficient knowledge distillation via uncertainty-aware mixup,” Pattern Recognit., vol. 138, p. 109338, 2023.
  18. S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” arXiv preprint arXiv:1612.03928, 2016.
  19. Z. Huang and N. Wang, “Like what you like: Knowledge distill via neuron selectivity transfer,” arXiv preprint arXiv:1707.01219, 2017.
  20. N. Passalis and A. Tefas, “Learning deep representations with probabilistic knowledge transfer,” in Proceedings of the 14th European Conference on Computer Vision (ECCV), 2018, pp. 268–284.
  21. D. Wu and Y. Wang, “Adversarial neuron pruning purifies backdoored deep models,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 913–16 925, 2021.
  22. T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating backdooring attacks on deep neural networks,” IEEE Access, vol. 7, pp. 47 230–47 244, 2019.
  23. Y. Liu, X. Ma, J. Bailey, and F. Lu, “Reflection backdoor: A natural backdoor attack on deep neural networks,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16.   Springer, 2020, pp. 182–199.
  24. S. Zhao, X. Ma, X. Zheng, J. Bailey, J. Chen, and Y.-G. Jiang, “Clean-label backdoor attacks on video recognition models,” in Proceedings of the 30th conference on computer vision and pattern recognition, 2020, pp. 14 443–14 452.
  25. A. Saha, A. Subramanya, and H. Pirsiavash, “Hidden trigger backdoor attacks,” in Proceedings of the 34th AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 11 957–11 965.
  26. Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in 25th Annual Network And Distributed System Security Symposium (NDSS 2018).   Internet Soc, 2018.
  27. Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  28. N. Carlini and A. Terzis, “Poisoning and backdooring contrastive learning,” arXiv preprint arXiv:2106.09667, 2021.
  29. W. Yang, L. Li, Z. Zhang, X. Ren, X. Sun, and B. He, “Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in nlp models,” arXiv preprint arXiv:2103.15543, 2021.
  30. L. Gan, J. Li, T. Zhang, X. Li, Y. Meng, F. Wu, Y. Yang, S. Guo, and C. Fan, “Triggerless backdoor attack for nlp tasks with clean labels,” arXiv preprint arXiv:2111.07970, 2021.
  31. S. Zhao, J. Wen, L. A. Tuan, J. Zhao, and J. Fu, “Prompt as triggers for backdoor attack: Examining the vulnerability in language models,” arXiv preprint arXiv:2305.01219, 2023.
  32. B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in 2019 IEEE Symposium on Security and Privacy (SP).   IEEE, 2019, pp. 707–723.
  33. J. Xia, T. Wang, J. Ding, X. Wei, and M. Chen, “Eliminating backdoor triggers for deep neural networks using attention relation graph distillation,” arXiv preprint arXiv:2204.09975, 2022.
  34. J. Gou, B. Yu, S. J. Maybank, and D. Tao, “Knowledge distillation: A survey,” International Journal of Computer Vision, vol. 129, pp. 1789–1819, 2021.
  35. G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
  36. J. Ba and R. Caruana, “Do deep nets really need to be deep?” Advances in neural information processing systems, vol. 27, 2014.
  37. M. Ji, B. Heo, and S. Park, “Show, attend and distill: Knowledge distillation via attention-based feature matching,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 9, 2021, pp. 7945–7952.
  38. K. Xu, L. Rui, Y. Li, and L. Gu, “Feature normalized knowledge distillation for image classification,” in Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV.   Springer, 2020, pp. 664–680.
  39. S. Srinivas and F. Fleuret, “Knowledge transfer with jacobian matching,” in Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018.   PMLR, 2018, pp. 4723–4731.
  40. A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
  41. B. Peng, X. Jin, J. Liu, D. Li, Y. Wu, Y. Liu, S. Zhou, and Z. Zhang, “Correlation congruence for knowledge distillation,” in Proceedings of the 29th International Conference on Computer Vision, 2019, pp. 5007–5016.
  42. K. Wang, X. Gao, Y. Zhao, X. Li, D. Dou, and C.-Z. Xu, “Pay attention to features, transfer learn faster cnns,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2019.
  43. S. Zagoruyko and N. Komodakis, “Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer,” 2016.
  44. F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, 2019, pp. 1365–1374.
  45. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Toronto, ON, Canada, 2009.
  46. A. Coates, A. Y. Ng, and H. Lee, “An analysis of single-layer networks in unsupervised feature learning,” in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, vol. 15.   Fort Lauderdale, USA: JMLR.org, April 11-13 2011, pp. 215–223.
  47. J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition,” Neural Networks, vol. 32, no. 1, pp. 323–332, 2012.
  48. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” pp. 1–9, 2011.
  49. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the 26th conference on computer vision and pattern recognition, 2016, pp. 770–778.
  50. Q. Xu, G. Tao, J. Honorio, Y. Liu, and S. e. a. An, “Medic: Remove model backdoors via importance driven cloning,” in Proceedings of the 33rd Conference on Computer Vision and Pattern Recognition, 2023, pp. 20 485–20 494.
  51. Q. Xu, G. Tao, J. Honorio, Y. Liu, S. An, G. Shen, S. Cheng, and X. Zhang, “Remove model backdoors via importance driven cloning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023.
  52. S. Feng, G. Tao, S. Cheng, G. Shen, X. Xu, Y. Liu, K. Zhang, S. Ma, and X. Zhang, “Detecting backdoors in pre-trained encoders,” in Proceedings of the 33rd Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
  53. “Knowledge distillation: A good teacher is patient and consistent,” arXiv preprint arXiv:2106.05237, 2022.
  54. C. Li, R. Pang, Z. Xi, T. Du, S. Ji, Y. Yao, and T. Wang, “Demystifying self-supervised trojan attacks,” arXiv preprint arXiv:2210.07346, 2022.
  55. S. Cheng, Y. Liu, S. Ma, and X. Zhang, “Deep feature space trojan attack of neural networks by controlled detoxification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, 2021, pp. 1148–1156.
  56. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
  57. J. Cook and J. Ranstam, “Overfitting,” Journal of British Surgery, vol. 103, no. 13, pp. 1814–1814, 2016.
  58. C. Zhang, O. Vinyals, R. Munos, and S. Bengio, “A study on overfitting in deep reinforcement learning,” arXiv preprint arXiv:1804.06893, 2018.
  59. R. Roelofs, V. Shankar, B. Recht, S. Fridovich-Keil, M. Hardt, J. Miller, and L. Schmidt, “A meta-analysis of overfitting in machine learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  60. H. Li, J. Li, X. Guan, B. Liang, Y. Lai, and X. Luo, “Research on overfitting of deep learning,” in 2019 15th international conference on computational intelligence and security (CIS).   IEEE, 2019, pp. 78–81.
  61. V. Feldman and C. Zhang, “What neural networks memorize and why: Discovering the long tail via influence estimation,” Advances in Neural Information Processing Systems, vol. 33, pp. 2881–2891, 2020.
  62. F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, “Residual attention network for image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3156–3164.
  63. K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., “A survey on vision transformer,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 87–110, 2022.
  64. D. Walawalkar, Z. Shen, and M. Savvides, “Online ensemble model compression using knowledge distillation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16.   Springer, 2020, pp. 18–35.
  65. Y. Liu, W. Zhang, and J. Wang, “Adaptive multi-teacher multi-level knowledge distillation,” Neurocomputing, vol. 415, pp. 106–113, 2020.
  66. E. Granger, M. Kiran, J. Dolz, L.-A. Blais-Morin et al., “Joint progressive knowledge distillation and unsupervised domain adaptation,” in 2020 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2020, pp. 1–8.
  67. Y. Liu, Z. Shu, Y. Li, Z. Lin, F. Perazzi, and S.-Y. Kung, “Content-aware gan compression,” in Proceedings of the 31st Conference on Computer Vision and Pattern Recognition.   Computer Vision Foundation / IEEE, 2021, pp. 12 156–12 166.
  68. T.-Y. Chang and C.-J. Lu, “Tinygan: Distilling biggan for conditional image generation,” in Proceedings of 15th Asian Conference on Computer Vision, 2020, pp. 509–525.
  69. Z. Shen, Z. He, and X. Xue, “Meal: Multi-model ensemble via adversarial learning,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 4886–4893.
  70. T. Li, J. Li, Z. Liu, and C. Zhang, “Few sample knowledge distillation for efficient network compression,” in Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 14 639–14 647.
  71. Q. Liu, L. Xie, H. Wang, and A. L. Yuille, “Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval,” in Proceedings of the 29th IEEE/CVF International Conference on Computer Vision, 2019, pp. 3662–3671.
  72. Y.-X. Wang, A. Bardes, R. Salakhutdinov, and M. Hebert, “Progressive knowledge distillation for generative modeling,” 2019.
  73. R. El-Bouri, D. Eyre, P. Watkinson, T. Zhu, and D. Clifton, “Student-teacher curriculum learning via reinforcement learning: predicting hospital inpatient admission location,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 2848–2857.
  74. K.-H. Lai, D. Zha, Y. Li, and X. Hu, “Dual policy distillation,” arXiv preprint arXiv:2006.04061, 2020.
  75. Y. Fang, K. Ren, W. Liu, D. Zhou, W. Zhang, J. Bian, Y. Yu, and T.-Y. Liu, “Universal trading for order execution with oracle policy distillation,” in Proceedings of the 35th AAAI Conference on Artificial Intelligence, vol. 35, no. 1, 2021, pp. 107–115.
  76. K. Yoshida and T. Fujino, “Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks,” in Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security, 2020, pp. 117–127.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets