Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning (2403.16257v1)

Published 24 Mar 2024 in cs.CV

Abstract: Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features using the complementary strengths of various data modalities. However, the open nature of such systems inadvertently increases the possibility of backdoor attacks. These attacks subtly embed malicious behaviors within the model during training, which can be activated by specific triggers in the inference phase, posing significant security risks. Despite existing countermeasures through fine-tuning that reduce the adverse impacts of such attacks, these defenses often degrade the clean accuracy and necessitate the construction of extensive clean training pairs. In this paper, we explore the possibility of a less-cost defense from the perspective of model unlearning, that is, whether the model can be made to quickly \textbf{u}nlearn \textbf{b}ackdoor \textbf{t}hreats (UBT) by constructing a small set of poisoned samples. Specifically, we strengthen the backdoor shortcuts to discover suspicious samples through overfitting training prioritized by weak similarity samples. Building on the initial identification of suspicious samples, we introduce an innovative token-based localized forgetting training regime. This technique specifically targets the poisoned aspects of the model, applying a focused effort to unlearn the backdoor associations and trying not to damage the integrity of the overall model. Experimental results show that our method not only ensures a minimal success rate for attacks, but also preserves the model's high clean accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. In ICCV, pages 112–123, 2023.
  2. A new backdoor attack in cnns by training set corruption without label poisoning. In ICIP, pages 101–105. IEEE, 2019.
  3. Poisoning and backdooring contrastive learning. arXiv preprint arXiv:2106.09667, 2021.
  4. Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In (ICCV), pages 397–406, 2021.
  5. Universal watermark vaccine: Universal adversarial perturbations for watermark protection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.
  6. Less is more: Fewer interpretable region via submodular subset selection. arXiv preprint arXiv:2402.09164, 2024.
  7. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
  8. Face encryption via frequency-restricted identity-agnostic attacks. In Proceedings of the 31st ACM International Conference on Multimedia, 2023.
  9. Detecting backdoors in pre-trained encoders. In CVPR, pages 16352–16362, 2023.
  10. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
  11. Isolation and induction: Training robust deep neural networks against model stealing attacks. In Proceedings of the 31st ACM International Conference on Multimedia, 2023.
  12. Sa-attack: Improving adversarial transferability of vision-language pre-training models via self-augmentation. arXiv preprint arXiv:2312.04913, 2023a.
  13. Generating transferable 3d adversarial point cloud via random perturbation factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023b.
  14. Privacy-enhancing face obfuscation guided by semantic-aware attribution maps. IEEE Transactions on Information Forensics and Security, 2023.
  15. Learning to optimize permutation flow shop scheduling via graph-based imitation learning. arXiv preprint arXiv:2210.17178, 2022.
  16. Invisible backdoor attack with sample-specific triggers. In ICCV, pages 16463–16472, 2021a.
  17. Anti-backdoor learning: Training clean models on poisoned data. NeurIPS, pages 14900–14912, 2021b.
  18. Exploring inconsistent knowledge distillation for object detection with data augmentation. In Proceedings of the 31st ACM International Conference on Multimedia, 2023a.
  19. Poisoned forgery face: Towards backdoor attacks on face forgery detection. arXiv preprint arXiv:2402.11473, 2024a.
  20. Vl-trojan: Multimodal instruction backdoor attacks against autoregressive visual language models. arXiv preprint arXiv:2402.13851, 2024b.
  21. A large-scale multiple-objective method for black-box attack against object detection. In European Conference on Computer Vision, 2022a.
  22. Imitated detectors: Stealing knowledge of black-box object detectors. In Proceedings of the 30th ACM International Conference on Multimedia, 2022b.
  23. Parallel rectangle flip attack: A query-based black-box attack against object detection. arXiv preprint arXiv:2201.08970, 2022c.
  24. Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning. arXiv preprint arXiv:2311.12075, 2023b.
  25. Perceptual-sensitive gan for generating adversarial patches. In AAAI, 2019.
  26. Spatiotemporal attacks for embodied agents. In ECCV, 2020a.
  27. Bias-based universal adversarial patch attack for automatic check-out. In ECCV, 2020b.
  28. Training robust deep neural networks via adversarial noise propagation. TIP, 2021.
  29. {{\{{X-Adv}}\}}: Physical adversarial object attacks against x-ray prohibited item detection. In 32nd USENIX Security Symposium (USENIX Security 23), 2023a.
  30. Exploring the relationship between architectural design and adversarially robust generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023b.
  31. Pre-trained trojan attacks for visual recognition. arXiv preprint arXiv:2312.15172, 2023c.
  32. Improving adversarial transferability by stable diffusion. arXiv preprint arXiv:2311.11017, 2023d.
  33. Spatio-temporal embedding for statistical face recognition from video. In ECCV, 2006.
  34. Does few-shot learning suffer from backdoor attacks? arXiv preprint arXiv:2401.01377, 2023e.
  35. Hide in thicket: Generating imperceptible and rational adversarial perturbations on 3d point clouds. arXiv preprint arXiv:2403.05247, 2024.
  36. A survey of machine unlearning. arXiv preprint arXiv:2209.02299, 2022.
  37. Learning transferable visual models from natural language supervision. In ICLR, pages 8748–8763, 2021.
  38. Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In ACL, pages 2556–2565, 2018.
  39. Improving robust fariness via balance adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  40. Video based face recognition using multiple classifiers. In Automatic Face and Gesture Recognition, 2004.
  41. Adaptive perturbation generation for multiple backdoors detection. arXiv preprint arXiv:2209.05244, 2022a.
  42. Universal backdoor attacks detection via adaptive adversarial probe. arXiv preprint arXiv:2209.05244, 2022b.
  43. Diversifying the high-level features for better adversarial transferability. arXiv preprint arXiv:2304.10136, 2023.
  44. Transferable adversarial attacks for image and video object detection. arXiv preprint arXiv:1811.12641, 2018.
  45. Large language model unlearning. arXiv preprint arXiv:2310.10683, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Siyuan Liang (73 papers)
  2. Kuanrong Liu (3 papers)
  3. Jiajun Gong (4 papers)
  4. Jiawei Liang (8 papers)
  5. Yuan Xun (7 papers)
  6. Ee-Chien Chang (44 papers)
  7. Xiaochun Cao (177 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.