Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection (2405.15269v2)

Published 24 May 2024 in cs.CV and cs.LG

Abstract: Multimodal contrastive learning methods (e.g., CLIP) have shown impressive zero-shot classification performance due to their strong ability to joint representation learning for visual and textual modalities. However, recent research revealed that multimodal contrastive learning on poisoned pre-training data with a small proportion of maliciously backdoored data can induce backdoored CLIP that could be attacked by inserted triggers in downstream tasks with a high success rate. To defend against backdoor attacks on CLIP, existing defense methods focus on either the pre-training stage or the fine-tuning stage, which would unfortunately cause high computational costs due to numerous parameter updates. In this paper, we provide the first attempt at a computationally efficient backdoor detection method to defend against backdoored CLIP in the inference stage. We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts. Motivated by this observation, we propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting. Specifically, we first prompt the LLM (e.g., GPT-4) to produce class-related description texts (benign) and class-perturbed random texts (malignant) by specially designed instructions. Then, the distribution difference in cosine similarity between images and the two types of class description texts can be used as the criterion to detect backdoor samples. Extensive experiments validate that our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. In ICCV, pages 112–123, 2023.
  3. Food-101–mining discriminative components with random forests. In ECCV, pages 446–461, 2014.
  4. Poisoning web-scale training datasets is practical. arXiv preprint arXiv:2302.10149, 2023.
  5. Poisoning and backdooring contrastive learning. arXiv preprint arXiv:2106.09667, 2021.
  6. Effective backdoor defense by exploiting sensitivity of poisoned samples. In NeurIPS, pages 9727–9737, 2022.
  7. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
  8. Textual backdoor attacks can be more harmful via two simple tricks. arXiv preprint arXiv:2110.08247, 2021.
  9. Villandiffusion: A unified backdoor attack framework for diffusion models. In NeurIPS, volume 36, 2023.
  10. Backdoor attack with imperceptible input and latent modification. In NeurIPS, pages 18944–18957, 2021.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  12. Tom Fawcett. An introduction to roc analysis. Pattern Recognition Letters, 27(8):861–874, 2006.
  13. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In CVPR Workshop, pages 178–178, 2004.
  14. Detecting backdoors in pre-trained encoders. In CVPR, pages 16352–16362, 2023.
  15. Text descriptions are compressive and invariant representations for visual learning, 2023.
  16. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, pages 113–125, 2019.
  17. Backdoor attack with sparse and invisible trigger. arXiv preprint arXiv:2306.06209, 2023.
  18. Cyclip: Cyclic contrastive language-image pretraining. In NeurIPS, pages 6704–6719, 2022.
  19. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
  20. Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency. arXiv preprint arXiv:2302.03251, 2023.
  21. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  22. Prompt me up: Unleashing the power of alignments for multimodal entity and relation extraction. In MM, pages 5185–5194, 2023.
  23. Badtrack: A poison-only backdoor attack on visual object tracking. In NeurIPS, 2023.
  24. Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, pages 4904–4916, 2021.
  25. Uniclip: Unified framework for contrastive language-image pre-training. In NeurIPS, pages 1008–1019, 2022.
  26. An embarrassingly simple backdoor attack on self-supervised learning. In ICCV, pages 4367–4378, 2023.
  27. Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  28. Invisible backdoor attack with sample-specific triggers. In ICCV, pages 16463–16472, 2021.
  29. Ibd-psc: Input-level backdoor detection via parameter-oriented scaling consistency. In ICML, 2024.
  30. Detecting backdoors during the inference stage based on corruption robustness consistency. In CVPR, pages 16363–16372, 2023.
  31. Multi-modal attribute prompting for vision-language models. arXiv preprint arXiv:2403.00219, 2024.
  32. Enhancing clip with gpt-4: Harnessing visual descriptions as prompts. In ICCV, pages 262–271, 2023.
  33. Towards stable backdoor purification through feature shift tuning. In NeurIPS, 2023.
  34. Slip: Self-supervision meets language-image pre-training. In ECCV, pages 529–544, 2022.
  35. Wanet–imperceptible warping-based backdoor attack. arXiv preprint arXiv:2102.10369, 2021.
  36. Iba: Towards irreversible backdoor attacks in federated learning. In NeurIPS, 2023.
  37. Backdoor secrets unveiled: Identifying backdoor data with optimized scaled prediction consistency. arXiv preprint arXiv:2403.10717, 2024.
  38. What does a platypus look like? generating customized prompts for zero-shot image classification. In ICCV, pages 15691–15701, 2023.
  39. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  40. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  41. Improved zero-shot classification by adapting vlms with text descriptions. arXiv preprint arXiv:2401.02460, 2024.
  42. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In ACL, pages 2556–2565, 2018.
  43. Black-box backdoor defense via zero-shot image purification. In NeurIPS, 2023.
  44. Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. In NeurIPS, pages 19165–19178, 2022.
  45. Tijo: Trigger inversion with joint optimization for defending multimodal backdoored models. In ICCV, pages 165–175, 2023.
  46. Spectral signatures in backdoor attacks. In NeurIPS, 2018.
  47. Label-consistent backdoor attacks. arXiv preprint arXiv:1912.02771, 2019.
  48. Model agnostic defence against backdoor attacks in machine learning. IEEE Transactions on Reliability, 71(2):880–895, 2022.
  49. Attention is all you need. In NeurIPS, 2017.
  50. Backdoor attacks against deep learning systems in the physical world. In CVPR, pages 6206–6215, 2021.
  51. Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching. In CVPR, pages 7031–7040, 2023.
  52. Ra-clip: Retrieval augmented contrastive language-image pre-training. In CVPR, pages 19265–19274, 2023.
  53. Videoclip: Contrastive pre-training for zero-shot video-text understanding. In EMNLP, pages 6787–6800, 2021.
  54. Exploring the universal vulnerability of prompt-based learning paradigm. arXiv preprint arXiv:2204.05239, 2022.
  55. Trojprompt: A black-box trojan attack on pre-trained language models. In NeurIPS, 2023.
  56. Robust contrastive language-image pretraining against data poisoning and backdoor attacks. In NeurIPS, 2023.
  57. Language in a bottle: Language model guided concept bottlenecks for interpretable image classification. In CVPR, pages 19187–19197, 2023.
  58. Language models as black-box optimizers for vision-language models. arXiv preprint arXiv:2309.05950, 2023.
  59. Rethinking the backdoor attacks’ triggers: A frequency perspective. In ICCV, pages 16473–16481, 2021.
  60. Bagflip: A certified defense against data poisoning. In NeurIPS, pages 31474–31483, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuwei Niu (6 papers)
  2. Shuo He (7 papers)
  3. Qi Wei (52 papers)
  4. Feng Liu (1212 papers)
  5. Lei Feng (190 papers)
  6. Zongyu Wu (15 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com