Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Difficulty of Defending Contrastive Learning against Backdoor Attacks (2312.09057v1)

Published 14 Dec 2023 in cs.CR, cs.AI, and cs.CV

Abstract: Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers. However, thus far it remains under-explored how contrastive backdoor attacks fundamentally differ from their supervised counterparts, which impedes the development of effective defenses against the emerging threat. This work represents a solid step toward answering this critical question. Specifically, we define TRL, a unified framework that encompasses both supervised and contrastive backdoor attacks. Through the lens of TRL, we uncover that the two types of attacks operate through distinctive mechanisms: in supervised attacks, the learning of benign and backdoor tasks tends to occur independently, while in contrastive attacks, the two tasks are deeply intertwined both in their representations and throughout their learning processes. This distinction leads to the disparate learning dynamics and feature distributions of supervised and contrastive attacks. More importantly, we reveal that the specificities of contrastive backdoor attacks entail important implications from a defense perspective: existing defenses for supervised attacks are often inadequate and not easily retrofitted to contrastive attacks. We also explore several alternative defenses and discuss their potential challenges. Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks, pointing to promising directions for future research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Proceedings of IEEE Conference on Machine Learning (ICML), 2020.
  2. J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo, M. Gheshlaghi Azar et al., “Bootstrap your own latent - a new approach to self-supervised learning,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  3. A. Newell and J. Deng, “How useful is self-supervised pretraining for visual tasks?” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  4. X. Chen and K. He, “Exploring simple siamese representation learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  5. X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” ArXiv e-prints, 2020.
  6. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  7. A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” ArXiv e-prints, 2022.
  8. OpenAI, “Dalle2,” https://openai.com/dall-e-2/.
  9. T. Gu, B. Dolan-Gavitt, and S. Garg, “Badnets: Identifying vulnerabilities in the machine learning model supply chain,” ArXiv e-prints, 2017.
  10. W. Brendel, J. Rauber, and M. Bethge, “Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models,” ArXiv e-prints, 2017.
  11. R. Pang, H. Shen, X. Zhang, S. Ji, Y. Vorobeychik, X. Luo, A. Liu, and T. Wang, “A tale of evil twins: Adversarial inputs versus poisoned models,” in Proceedings of ACM Conference on Computer and Communications (CCS), 2020.
  12. Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang, “Trojaning attack on neural networks,” in Proceedings of Network and Distributed System Security Symposium (NDSS), 2018.
  13. Y. Yao, H. Li, H. Zheng, and B. Y. Zhao, “Latent backdoor attacks on deep neural networks,” in Proceedings of ACM Conference on Computer and Communications (CCS), 2019.
  14. A. Saha, A. Tejankar, S. A. Koohpayegani, and H. Pirsiavash, “Backdoor attacks on self-supervised learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  15. C. Li, R. Pang, Z. Xi, T. Du, S. Ji, Y. Yao, and T. Wang, “An embarrassingly simple backdoor attack on self-supervised learning,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2023.
  16. H. Liu, J. Jia, and N. Z. Gong, “Poisonedencoder: Poisoning the unlabeled pre-training data in contrastive learning,” in Proceedings of USENIX Security Symposium (SEC), 2022.
  17. A. van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” ArXiv e-prints, 2018.
  18. S. Li, B. Z. Hao Zhao, J. Yu, M. Xue, D. Kaafar, and H. Zhu, “Invisible backdoor attacks against deep neural networks,” ArXiv e-prints, 2019.
  19. J. Jia, Y. Liu, and N. Z. Gong, “Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning,” in Proceedings of IEEE Symposium on Security and Privacy (S&P), 2021.
  20. N. Carlini and A. Terzis, “Poisoning and backdooring contrastive learning,” in Proceedings of International Conference on Learning Representations (ICLR), 2022.
  21. N. Carlini, M. Jagielski, C. A. Choquette-Choo, D. Paleka, W. Pearce, H. Anderson, A. Terzis, K. Thomas, and F. Tramèr, “Poisoning web-scale training datasets is practical,” ArXiv e-prints, 2023.
  22. A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks on neural networks,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2018.
  23. Y. Li, Y. Jiang, Z. Li, and S.-T. Xia, “Backdoor learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  24. T. A. Nguyen and A. Tran, “Input-aware dynamic backdoor attack,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  25. A. Krizhevsky and G. Hinton, “Learning Multiple Layers of Features from Tiny Images,” Technical report, University of Toronto, 2009.
  26. Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, p. 3, 2015.
  27. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  28. Y. Li, X. Lyu, N. Koren, L. Lyu, B. Li, and X. Ma, “Anti-backdoor learning: Training clean models on poisoned data,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2021.
  29. H. Wang, J. Hong, A. Zhang, J. Zhou, and Z. Wang, “Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2022.
  30. T. Wang, Y. Yao, F. Xu, S. An, H. Tong, and T. Wang, “An invisible black-box backdoor attack through frequency domain,” in Proceedings of European Conference on Computer Vision (ECCV), 2022.
  31. B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee, I. Molloy, and B. Srivastava, “Detecting backdoor attacks on deep neural networks by activation clustering,” ArXiv e-prints, 2018.
  32. K. Huang, Y. Li, B. Wu, Z. Qin, and K. Ren, “Backdoor Defense via Decoupling the Training Process,” in Proceedings of International Conference on Learning Representations (ICLR), 2022.
  33. D. Tang, X. Wang, H. Tang, and K. Zhang, “Demon in the variant: Statistical analysis of dnns for robust backdoor contamination detection,” in Proceedings of USENIX Security Symposium (SEC), 2020.
  34. J. Hayase, W. Kong, R. Somani, and S. Oh, “Spectre: defending against backdoor attacks using robust statistics,” in Proceedings of IEEE Conference on Machine Learning (ICML), 2021.
  35. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 11, 2008.
  36. Y. Wang, Q. Zhang, Y. Wang, J. Yang, and Z. Lin, “Chaos is a ladder: A new theoretical understanding of contrastive learning via augmentation overlap,” ArXiv e-prints, 2022.
  37. A. Kraskov, H. Stögbauer, and P. Grassberger, “Estimating mutual information,” Physical review E, vol. 69, no. 6, p. 066138, 2004.
  38. Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal, “Strip: A defence against trojan attacks on deep neural networks,” in Proceedings of Annual Computer Security Applications Conference (ACSAC), 2019.
  39. K. Liu, B. Dolan-Gavitt, and S. Garg, “Fine-pruning: Defending against backdooring attacks on deep neural networks,” in Proceedings of International Symposium on Research in Attacks, Intrusions, and Defenses (RAID), 2018.
  40. B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in Proceedings of IEEE Symposium on Security and Privacy (S&P), 2019.
  41. K. Huang, Y. Li, B. Wu, Z. Qin, and K. Ren, “Backdoor defense via decoupling the training process,” in Proceedings of International Conference on Learning Representations (ICLR), 2021.
  42. Y. Liu, Y. Xie, and A. Srivastava, “Neural trojans,” in Proceedings of IEEE International Conference on Computer Design (ICCD), 2017.
  43. N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of European Conference on Computer Vision (ECCV), 2018.
  44. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  45. M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics: Ordering points to identify the clustering structure,” in Proceedings of ACM International Conference on Management of Data (SIGMOD), 1999.
  46. A. Ram, S. Jalal, A. S. Jalal, and M. Kumar, “A density based algorithm for discovering density varied clusters in large spatial databases,” International Journal of Computer Applications, vol. 3, no. 6, pp. 1–4, 2010.
  47. R. Zheng, R. Tang, J. Li, and L. Liu, “Data-free Backdoor Removal based on Channel Lipschitzness,” in Proceedings of European Conference on Computer Vision (ECCV), 2022.
  48. W. H. Press and S. A. Teukolsky, “Savitzky-golay smoothing filters,” Computers in Physics, vol. 4, no. 6, pp. 669–672, 1990.
  49. X. Xu, Q. Wang, H. Li, N. Borisov, C. A. Gunter, and B. Li, “Detecting ai trojans using meta neural analysis,” in Proceedings of IEEE Symposium on Security and Privacy (S&P), 2021.
  50. M. Weber, X. Xu, B. Karlaš, C. Zhang, and B. Li, “Rab: Provable robustness against backdoor attacks,” in Proceedings of IEEE Symposium on Security and Privacy (S&P), 2023.
  51. S. Lee, D. Bok Lee, and S. J. Hwang, “Contrastive learning with adversarial perturbations for conditional text generation,” in Proceedings of International Conference on Learning Representations (ICLR), 2021.
  52. J. Giorgi, O. Nitski, B. Wang, and G. Bader, “Declutr: Deep contrastive learning for unsupervised textual representations,” in Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), 2021.
  53. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  54. Z. Jiang, T. Chen, T. Chen, and Z. Wang, “Robust pre-training by adversarial contrastive learning,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  55. L. Fan, S. Liu, P.-Y. Chen, G. Zhang, and C. Gan, “When does contrastive learning preserve adversarial robustness from pretraining to finetuning?” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2021.
  56. D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-supervised learning can improve model robustness and uncertainty,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2019.
  57. P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2020.
  58. L. Yang, W. Guo, Q. Hao, A. Ciptadi, A. Ahmadzadeh, X. Xing, and G. Wang, “{{\{{CADE}}\}}: Detecting and explaining concept drift samples for security applications,” in Proceedings of USENIX Security Symposium (SEC), 2021.
  59. Y. Chen, Z. Ding, and D. Wagner, “Continuous learning for android malware detection,” ArXiv e-prints, 2023.
  60. A. Shafahi, W. Ronny Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks on neural networks,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2018.
  61. Y. Ji, X. Zhang, S. Ji, X. Luo, and T. Wang, “Model-Reuse Attacks on Deep Learning Systems,” in Proceedings of ACM SAC Conference on Computer and Communications (CCS), 2018.
  62. X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks on deep learning systems using data poisoning,” ArXiv e-prints, 2017.
  63. A. Turner, D. Tsipras, and A. Madry, “Label-consistent backdoor attacks,” ArXiv e-prints, 2019.
  64. S. Zhao, X. Ma, X. Zheng, J. Bailey, J. Chen, and Y.-G. Jiang, “Clean-label backdoor attacks on video recognition models,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  65. R. Pang, Z. Zhang, X. Gao, Z. Xi, S. Ji, P. Cheng, and T. Wang, “Trojanzoo: Towards unified, holistic, and practical evaluation of neural backdoors,” in Proceedings of IEEE European Symposium on Security and Privacy (Euro S&P), 2020.
  66. B. Tran, J. Li, and A. Madry, “Spectral signatures in backdoor attacks,” in Proceedings of Advances in Neural Information Processing Systems (NeurIPS), 2018.
  67. S. Kolouri, A. Saha, H. Pirsiavash, and H. Hoffmann, “Universal litmus patterns: Revealing backdoor attacks in cnns,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  68. X. Huang, M. Alzantot, and M. Srivastava, “Neuroninspect: Detecting backdoors in neural networks via output explanations,” ArXiv e-prints, 2019.
  69. Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang, “Abs: Scanning neural networks for back-doors by artificial brain stimulation,” in Proceedings of ACM Conference on Computer and Communications (CCS), 2019.
  70. M. Subedar, N. Ahuja, R. Krishnan, I. J. Ndiour, and O. Tickoo, “Deep probabilistic models to detect data poisoning attacks,” ArXiv e-prints, 2019.
  71. M. Rahman et al., “A dwt, dct and svd based watermarking technique to protect the image piracy,” ArXiv e-prints, 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Changjiang Li (22 papers)
  2. Ren Pang (15 papers)
  3. Bochuan Cao (16 papers)
  4. Zhaohan Xi (20 papers)
  5. Jinghui Chen (50 papers)
  6. Shouling Ji (136 papers)
  7. Ting Wang (213 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com