BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning (2311.12075v3)
Abstract: Studying backdoor attacks is valuable for model copyright protection and enhancing defenses. While existing backdoor attacks have successfully infected multimodal contrastive learning models such as CLIP, they can be easily countered by specialized backdoor defenses for MCL models. This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses and introduces the \emph{\toolns} attack, which is resistant to backdoor detection and model fine-tuning defenses. To achieve this, we draw motivations from the perspective of the Bayesian rule and propose a dual-embedding guided framework for backdoor attacks. Specifically, we ensure that visual trigger patterns approximate the textual target semantics in the embedding space, making it challenging to detect the subtle parameter variations induced by backdoor learning on such natural trigger patterns. Additionally, we optimize the visual trigger patterns to align the poisoned samples with target vision features in order to hinder the backdoor unlearning through clean fine-tuning. Extensive experiments demonstrate that our attack significantly outperforms state-of-the-art baselines (+45.3% ASR) in the presence of SoTA backdoor defenses, rendering these mitigation and detection strategies virtually ineffective. Furthermore, our approach effectively attacks some more rigorous scenarios like downstream tasks. We believe that this paper raises awareness regarding the potential threats associated with the practical application of multimodal contrastive learning and encourages the development of more robust defense mechanisms.
- Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. CoRR, abs/2303.03323, 2023.
- A new backdoor attack in CNNS by training set corruption without label poisoning. In ICIP, 2019.
- Image-text retrieval: A survey on recent research and development. In IJCAI, 2022.
- Poisoning and backdooring contrastive learning. In ICLR, 2022.
- Cross-modal image-text retrieval with semantic consistency. In Proceedings of the 27th ACM international conference on multimedia, pages 1749–1757, 2019.
- Effective backdoor defense by exploiting sensitivity of poisoned samples. Advances in Neural Information Processing Systems, 35:9727–9737, 2022.
- Targeted backdoor attacks on deep learning systems using data poisoning. CoRR, abs/1712.05526, 2017.
- UNITER: universal image-text representation learning. In ECCV, 2020.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Detecting backdoors in pre-trained encoders. In CVPR, 2023.
- Imperceptible and robust backdoor attack in 3d point cloud. IEEE Transactions on Information Forensics and Security, 19:1267–1282, 2023a.
- Backdoor defense via adaptively splitting poisoned dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4005–4014, 2023b.
- Backdoor attacks and countermeasures on deep learning: A comprehensive review. arXiv preprint arXiv:2007.10760, 2020.
- Cyclip: Cyclic contrastive language-image pretraining. In NeurIPS, 2022.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain. CoRR, abs/1708.06733, 2017.
- Defending our privacy with backdoors. CoRR, abs/2310.08320, 2023.
- Scaling up visual and vision-language representation learning with noisy text supervision. In ICML, 2021.
- Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In IEEE SP, 2022.
- Uniclip: Unified framework for contrastive language-image pre-training. In NeurIPS, 2022.
- Unicoder-vl: A universal encoder for vision and language by cross-modal pre-training. In AAAI, 2020a.
- Open-sourced dataset protection via backdoor watermarking. CoRR, abs/2010.05821, 2020b.
- Invisible backdoor attack with sample-specific triggers. In ICCV, 2021a.
- Anti-backdoor learning: Training clean models on poisoned data. In NeurIPS, 2021b.
- Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In ICLR, 2022.
- Perceptual-sensitive gan for generating adversarial patches. In AAAI, 2019.
- Spatiotemporal attacks for embodied agents. In ECCV, 2020a.
- Bias-based universal adversarial patch attack for automatic check-out. In ECCV, 2020b.
- X-adv: Physical adversarial object attacks against x-ray prohibited item detection. In USENIX Security Symposium, 2023a.
- Towards defending multiple lp-norm bounded adversarial perturbations via gated batch normalization. International Journal of Computer Vision, 2023b.
- Harnessing perceptual adversarial patches for crowd counting. In ACM CCS, 2022.
- Wanet - imperceptible warping-based backdoor attack. In ICLR, 2021.
- Im2text: Describing images using 1 million captioned photographs. In NeurIPS, 2011.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. In ACL, 2018.
- James V Stone. Bayes’ rule: a tutorial introduction to bayesian analysis. 2013.
- Distribution preserving backdoor attack in self-supervised learning. In IEEE SP, 2023.
- Deepstyle: Multimodal search engine for fashion and interior design. IEEE Access, 2019.
- Defending against patch-based backdoor attacks on self-supervised learning. In CVPR, 2023.
- Protecting privacy in the era of artificial intelligence. SSRN, 2020.
- Dual-key multimodal backdoors for visual question answering. In CVPR, 2022.
- Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pages 707–723. IEEE, 2019.
- Dual attention suppression attack: Generate adversarial camouflage in physical world. In CVPR, 2021.
- Ghostencoder: Stealthy backdoor attacks with dynamic triggers to pre-trained encoders in self-supervised learning. CoRR, abs/2310.00626, 2023.
- Backdoorbench: A comprehensive benchmark of backdoor learning. In NeurIPS, 2022a.
- Backdoorbench: A comprehensive benchmark of backdoor learning. Advances in Neural Information Processing Systems, 35:10546–10559, 2022b.
- Rethinking infonce: How many negative samples do you need? In IJCAI, 2022c.
- Large scale incremental learning. In CVPR, 2019.
- RA-CLIP: retrieval augmented contrastive language-image pre-training. In CVPR, 2023.
- Data poisoning attacks against multimodal encoders. In ICML, 2023.
- Deep multimodal neural architecture search. In Proceedings of the 28th ACM International Conference on Multimedia, pages 3743–3752, 2020.
- Ssl-cleanse: Trojan detection and mitigation in self-supervised learning. CoRR, abs/2303.09079, 2023.
- Contrastive learning for debiased candidate generation in large-scale recommender systems. In ACM SIGKDD, 2021.
- Enhancing fine-tuning based backdoor defense with sharpness-aware minimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4466–4477, 2023a.
- Neural polarizer: A lightweight and effective backdoor defense via purifying poisoned features. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Siyuan Liang (73 papers)
- Mingli Zhu (12 papers)
- Aishan Liu (72 papers)
- Baoyuan Wu (107 papers)
- Xiaochun Cao (177 papers)
- Ee-Chien Chang (44 papers)