VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models (2402.13851v1)
Abstract: Autoregressive Visual LLMs (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images, enabling malicious manipulation of the victim model's predictions with predefined triggers. Nevertheless, the frozen visual encoder in autoregressive VLMs imposes constraints on the learning of conventional image triggers. Additionally, adversaries may encounter restrictions in accessing the parameters and architectures of the victim model. To address these challenges, we propose a multimodal instruction backdoor attack, namely VL-Trojan. Our approach facilitates image trigger learning through an isolating and clustering strategy and enhance black-box-attack efficacy via an iterative character-level text trigger generation method. Our attack successfully induces target outputs during inference, significantly surpassing baselines (+62.52\%) in ASR. Moreover, it demonstrates robustness across various model scales and few-shot in-context reasoning scenarios.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision, pp. 2425–2433, 2015.
- Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390, 2023.
- Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning. arXiv preprint arXiv:2303.03323, 2023.
- A new backdoor attack in cnns by training set corruption without label poisoning. In 2019 IEEE International Conference on Image Processing (ICIP), pp. 101–105. IEEE, 2019.
- High-performance large-scale image recognition without normalization. In International Conference on Machine Learning, pp. 1059–1071. PMLR, 2021.
- Poisoning and backdooring contrastive learning. arXiv preprint arXiv:2106.09667, 2021.
- Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
- Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
- Backdooring multimodal learning. In 2024 IEEE Symposium on Security and Privacy (SP), pp. 31–31. IEEE Computer Society, 2023.
- Generating transferable 3d adversarial point cloud via random perturbation factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
- A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CsUR), 51(6):1–36, 2019.
- Mimic-it: Multi-modal in-context instruction tuning. arXiv preprint arXiv:2306.05425, 2023a.
- Otter: A multi-modal model with in-context instruction tuning. arXiv preprint arXiv:2305.03726, 2023b.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023c.
- Privacy-enhancing face obfuscation guided by semantic-aware attribution maps. IEEE Transactions on Information Forensics and Security, 2023d.
- Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 16463–16472, 2021.
- Backdoor learning: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Exploring inconsistent knowledge distillation for object detection with data augmentation. In Proceedings of the 31st ACM International Conference on Multimedia, pp. 768–778, 2023a.
- Poisoned forgery face: Towards backdoor attacks on face forgery detection. arXiv preprint arXiv:2402.11473, 2024.
- Efficient adversarial attacks for visual object tracking. In CEuropean Conference on Computer Vision, 2020.
- Generate more imperceptible adversarial examples for object detection. In ICML 2021 Workshop on Adversarial Machine Learning, 2021.
- A large-scale multiple-objective method for black-box attack against object detection. In European Conference on Computer Vision, 2022a.
- Imitated detectors: Stealing knowledge of black-box object detectors. In Proceedings of the 30th ACM International Conference on Multimedia, 2022b.
- Parallel rectangle flip attack: A query-based black-box attack against object detection. arXiv preprint arXiv:2201.08970, 2022c.
- Badclip: Dual-embedding guided backdoor attack on multimodal contrastive learning. arXiv preprint arXiv:2311.12075, 2023b.
- Perceptual-sensitive gan for generating adversarial patches. In AAAI, 2019.
- Spatiotemporal attacks for embodied agents. In ECCV, 2020a.
- Bias-based universal adversarial patch attack for automatic check-out. In ECCV, 2020b.
- X-adv: Physical adversarial object attacks against x-ray prohibited item detection. arXiv preprint arXiv:2302.09491, 1, 2023a.
- Exploring the relationship between architectural design and adversarially robust generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4096–4107, 2023b.
- Pre-trained trojan attacks for visual recognition. arXiv preprint arXiv:2312.15172, 2023c.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023d.
- Improving adversarial transferability by stable diffusion. arXiv preprint arXiv:2311.11017, 2023e.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- MosaicML. Introducing mpt-7b: A new standard for open-source, commercially usable llms, 2023.
- OpenAI. Gpt-4 technical report. 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. Advances in neural information processing systems, 29, 2016.
- Improving robust fariness via balance adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 15161–15169, 2023.
- Together.xyz. Releasing 3b and 7b redpajama-incite family of models including base, instruction-tuned & chat models. https://www.together.xyz/blog/redpajama-models-v1, 2023.
- Dual-key multimodal backdoors for visual question answering. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 15375–15385, 2022.
- Poisoning language models during instruction tuning. arXiv preprint arXiv:2305.00944, 2023.
- Dual attention suppression attack: Generate adversarial camouflage in physical world. In CVPR, 2021.
- An invisible black-box backdoor attack through frequency domain. In European Conference on Computer Vision, pp. 396–413. Springer, 2022a.
- Adaptive perturbation generation for multiple backdoors detection. arXiv preprint arXiv:2209.05244, 2022b.
- Transferable adversarial attacks for image and video object detection. arXiv preprint arXiv:1811.12641, 2018.
- Backdoorbench: A comprehensive benchmark of backdoor learning. Advances in Neural Information Processing Systems, 35:10546–10559, 2022.
- Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models. arXiv preprint arXiv:2305.14710, 2023.
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78, 2014.
- Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792, 2023.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023.
- Jiawei Liang (8 papers)
- Siyuan Liang (73 papers)
- Man Luo (55 papers)
- Aishan Liu (72 papers)
- Dongchen Han (12 papers)
- Ee-Chien Chang (44 papers)
- Xiaochun Cao (177 papers)