Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP (2311.16194v2)

Published 26 Nov 2023 in cs.CV

Abstract: Contrastive Vision-Language Pre-training, known as CLIP, has shown promising effectiveness in addressing downstream image recognition tasks. However, recent works revealed that the CLIP model can be implanted with a downstream-oriented backdoor. On downstream tasks, one victim model performs well on clean samples but predicts a specific target class whenever a specific trigger is present. For injecting a backdoor, existing attacks depend on a large amount of additional data to maliciously fine-tune the entire pre-trained CLIP model, which makes them inapplicable to data-limited scenarios. In this work, motivated by the recent success of learnable prompts, we address this problem by injecting a backdoor into the CLIP model in the prompt learning stage. Our method named BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP, i.e., influencing both the image and text encoders with the trigger. It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts, resulting in a powerful and generalizable attack. Extensive experiments conducted on 11 datasets verify that the clean accuracy of BadCLIP is similar to those of advanced prompt learning methods and the attack success rate is higher than 99% in most cases. BadCLIP is also generalizable to unseen classes, and shows a strong generalization capability under cross-dataset and cross-domain settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (105)
  1. Targeted attack against deep neural networks via flipping limited weight bits. arXiv preprint arXiv:2102.10496, 2021.
  2. Hardly perceptible trojan attack against neural networks with bit flips. In ECCV, 2022a.
  3. Improving vision transformers by revisiting high-frequency components. In ECCV, 2022b.
  4. Versatile weight attack via flipping limited bits. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  5. Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. In ICCV, 2023.
  6. A new backdoor attack in cnns by training set corruption without label poisoning. In ICIP, 2019.
  7. Food-101–mining discriminative components with random forests. In ECCV, 2014.
  8. Language-aware soft prompting for vision & language foundation models. arXiv preprint arXiv:2210.01115, 2022.
  9. Lasp: Text-to-text optimization for language-aware soft prompting of vision & language models. In CVPR, 2023.
  10. Poisoning and backdooring contrastive learning. In ICLR, 2022.
  11. PLOT: Prompt learning with optimal transport for vision-language models. In ICLR, 2023.
  12. A simple framework for contrastive learning of visual representations. In ICML, 2020a.
  13. Uniter: Universal image-text representation learning. In ECCV, 2020b.
  14. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005.
  15. Describing textures in the wild. In CVPR, 2014.
  16. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  17. Lira: Learnable, imperceptible and robust backdoor attacks. In ICCV, 2021.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  19. Reproducible scaling laws for contrastive language-image learning. arXiv, 2022.
  20. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPRw, 2004.
  21. Backdoor attack on hash-based image retrieval via clean-label data poisoning. In BMVC, 2023a.
  22. Imperceptible and robust backdoor attack in 3d point cloud. IEEE Transactions on Information Forensics and Security, 19:1267–1282, 2023b.
  23. Backdoor defense via adaptively splitting poisoned dataset. In CVPR, 2023c.
  24. Inducing high energy-latency of large vision-language models with verbose images. In ICLR, 2024.
  25. Not all samples are born equal: Towards effective clean-label backdoor attacks. Pattern Recognition, 139:109512, 2023d.
  26. Multi-feature canonical correlation analysis for face photo-sketch image retrieval. In ACM MM, 2013.
  27. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7:47230–47244, 2019.
  28. Deep residual learning for image recognition. In CVPR, 2016.
  29. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  30. Cpl: Counterfactual prompt learning for vision and language models. In EMNLP, 2022.
  31. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  32. The many faces of robustness: A critical analysis of out-of-distribution generalization. In ICCV, 2021a.
  33. Natural adversarial examples. In CVPR, 2021b.
  34. Unsupervised prompt learning for vision-language models. arXiv preprint arXiv:2204.03649, 2022.
  35. Scope of validity of psnr in image/video quality assessment. Electronics letters, 44(13):800–801, 2008.
  36. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
  37. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In SP, 2022.
  38. Learning visual features from large weakly supervised data. In ECCV, 2016.
  39. Maple: Multi-modal prompt learning. arXiv preprint arXiv:2210.03117, 2022.
  40. 3d object representations for fine-grained categorization. In ICCVw, 2013.
  41. Read-only prompt optimization for vision-language few-shot learning. In ICCV, 2023.
  42. The power of scale for parameter-efficient prompt tuning. In EMNLP, 2021.
  43. Nearest is not dearest: Towards practical defense against quantization-conditioned backdoor attacks. In CVPR, 2024a.
  44. Graphadapter: Tuning vision-language models with dual knowledge graph. In NeurIPS, 2024b.
  45. Prefix-tuning: Optimizing continuous prompts for generation. In ACL, 2021.
  46. Invisible backdoor attack with sample-specific triggers. In ICCV, 2021.
  47. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In ICLR, 2022.
  48. Mutual component analysis for heterogeneous face recognition. ACM Transactions on Intelligent Systems and Technology (TIST), 7(3):1–23, 2016.
  49. Efficient adversarial attacks for visual object tracking. In ECCV, 2020.
  50. Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning. arXiv preprint arXiv:2311.12075, 2023.
  51. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586, 2021.
  52. Spatio-temporal embedding for statistical face recognition from video. In ECCV, 2006.
  53. Detecting backdoors during the inference stage based on corruption robustness consistency. In CVPR, 2023.
  54. Trojaning attack on neural networks. In NDSS, 2018.
  55. Prompt distribution learning. In CVPR, 2022.
  56. Beyond sole strength: Customized ensembles for generalized vision-language models. arXiv preprint arXiv:2311.17091, 2023.
  57. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  58. Input-aware dynamic backdoor attack. NeurIPS, 2020.
  59. Wanet-imperceptible warping-based backdoor attack. In ICLR, 2021.
  60. Automated flower classification over a large number of classes. In ICVGIP, 2008.
  61. Cats and dogs. In CVPR, 2012.
  62. Learning transferable visual models from natural language supervision. In ICML, 2021.
  63. Denseclip: Language-guided dense prediction with context-aware prompting. In CVPR, 2022.
  64. Do imagenet classifiers generalize to imagenet? In ICML, 2019.
  65. Laion-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  66. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In ACL, 2018.
  67. Test-time prompt tuning for zero-shot generalization in vision-language models. In NeurIPS, 2022.
  68. Flava: A foundational language and vision alignment model. In CVPR, 2022.
  69. Zero-shot learning through cross-modal transfer. NeurIPS, 2013.
  70. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
  71. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. In NeurIPS, 2022.
  72. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In CVPR, 2022.
  73. Video based face recognition using multiple classifiers. In ICAFGR, 2004.
  74. Rethinking few-shot image classification: a good embedding is all you need? In ECCV 2020, 2020.
  75. Label-consistent backdoor attacks. arXiv preprint arXiv:1912.02771, 2019.
  76. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  77. Attention is all you need. In NeurIPS, 2017.
  78. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In SP, 2019a.
  79. Learning to decompose visual features with latent textual prompts. In ICLR, 2023.
  80. Learning robust global representations by penalizing local predictive power. NeurIPS, 2019b.
  81. Actionclip: A new paradigm for video action recognition. arXiv preprint arXiv:2109.08472, 2021.
  82. Triangle attack: A query-efficient decision-based adversarial attack. In ECCV, 2022.
  83. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  84. Transferable adversarial attacks for image and video object detection. arXiv preprint arXiv:1811.12641, 2018.
  85. Backdoorbench: A comprehensive benchmark of backdoor learning. In NeurIPS, 2022.
  86. Zero-shot learning-the good, the bad and the ugly. In CVPR, 2017.
  87. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
  88. Towards reliable and efficient backdoor trigger inversion via decoupling benign features. In ICLR, 2024.
  89. Towards faithful xai evaluation via generalization-limited backdoor watermark. In ICLR, 2024.
  90. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78, 2014.
  91. Florence: A new foundation model for computer vision. arXiv preprint arXiv:2111.11432, 2021.
  92. Instance-aware dynamic prompt tuning for pre-trained point cloud models. In ICCV, 2023.
  93. Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930, 2021.
  94. Pointclip: Point cloud understanding by clip. In CVPR, 2022.
  95. Tong Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In ICML, 2004.
  96. Joint representation and estimator learning for facial action unit intensity estimation. In CVPR, 2019.
  97. Clean-label backdoor attacks on video recognition models. In CVPR, 2020.
  98. Defeat: Deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints. In CVPR, 2022.
  99. Data-free backdoor removal based on channel lipschitzness. In ECCV, 2022.
  100. Factual probing is [mask]: Learning vs. learning to recall. In NAACL-HLT, pages 5017–5033, 2021.
  101. Conditional prompt learning for vision-language models. In CVPR, 2022a.
  102. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
  103. Prompt-aligned gradient for prompt tuning. In ICCV, 2023a.
  104. Enhancing fine-tuning based backdoor defense with sharpness-aware minimization. In ICCV, 2023b.
  105. Neural polarizer: A lightweight and effective backdoor defense via purifying poisoned features. In NeurIPS, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiawang Bai (23 papers)
  2. Kuofeng Gao (23 papers)
  3. Shaobo Min (13 papers)
  4. Shu-Tao Xia (171 papers)
  5. Zhifeng Li (74 papers)
  6. Wei Liu (1135 papers)
Citations (23)