VQAttack: Transferable Adversarial Attacks on Visual Question Answering via Pre-trained Models (2402.11083v1)
Abstract: Visual Question Answering (VQA) is a fundamental task in computer vision and natural language process fields. Although the pre-training & finetuning'' learning paradigm significantly improves the VQA performance, the adversarial robustness of such a learning paradigm has not been explored. In this paper, we delve into a new problem: using a pre-trained multimodal source model to create adversarial image-text pairs and then transferring them to attack the target VQA models. Correspondingly, we propose a novel VQAttack model, which can iteratively generate both image and text perturbations with the designed modules: the LLM-enhanced image attack and the cross-modal joint attack module. At each iteration, the LLM-enhanced image attack module first optimizes the latent representation-based loss to generate feature-level image perturbations. Then it incorporates an LLM to further enhance the image perturbations by optimizing the designed masked answer anti-recovery loss. The cross-modal joint attack module will be triggered at a specific iteration, which updates the image and text perturbations sequentially. Notably, the text perturbation updates are based on both the learned gradients in the word embedding space and word synonym-based substitution. Experimental results on two VQA datasets with five validated models demonstrate the effectiveness of the proposed VQAttack in the transferable attack setting, compared with state-of-the-art baselines. This work reveals a significant blind spot in the
pre-training & fine-tuning'' paradigm on VQA tasks. Source codes will be released.
- VQA: Visual Question Answering. In ICCV, 2425–2433. IEEE Computer Society.
- VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts. In NeurIPS.
- Universal Sentence Encoder for English. In EMNLP (Demonstration), 169–174. Association for Computational Linguistics.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT (1), 4171–4186. Association for Computational Linguistics.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR. OpenReview.net.
- FDA: Feature Disruptive Attack. In ICCV, 8068–8078. IEEE.
- Explaining and Harnessing Adversarial Examples. In ICLR (Poster).
- Gradient-based Adversarial Attacks against Text Transformers. In EMNLP (1), 5747–5757. Association for Computational Linguistics.
- Enhancing Adversarial Example Transferability With an Intermediate Level Attack. In ICCV, 4732–4741. IEEE.
- Transferable Perturbations of Deep Feature Distributions. In ICLR. OpenReview.net.
- Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability. In NeurIPS.
- Categorical Reparameterization with Gumbel-Softmax. In ICLR (Poster). OpenReview.net.
- BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning. In IEEE Symposium on Security and Privacy, 2043–2059. IEEE.
- Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. In AAAI, 8018–8025. AAAI Press.
- RobotVQA - A Scene-Graph- and Deep-Learning-based Visual Question Answering System for Robot Manipulation. In IROS, 9667–9674. IEEE.
- ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. In ICML, volume 139 of Proceedings of Machine Learning Research, 5583–5594. PMLR.
- Contextualized Perturbation for Textual Adversarial Attack. In NAACL-HLT, 5053–5069. Association for Computational Linguistics.
- TextBugger: Generating Adversarial Text Against Real-world Applications. In NDSS. The Internet Society.
- Align before Fuse: Vision and Language Representation Learning with Momentum Distillation. In NeurIPS, 9694–9705.
- BERT-ATTACK: Adversarial Attack Against BERT Using BERT. In EMNLP (1), 6193–6202. Association for Computational Linguistics.
- Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852.
- Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction. In CVPR, 937–946. Computer Vision Foundation / IEEE.
- Towards Deep Learning Models Resistant to Adversarial Attacks. In ICLR (Poster). OpenReview.net.
- Efficient Estimation of Word Representations in Vector Space. In ICLR (Workshop Poster).
- A Self-supervised Approach for Adversarial Robustness. In CVPR, 259–268. Computer Vision Foundation / IEEE.
- OpenAI. 2023. GPT-4 Technical Report. CoRR, abs/2303.08774.
- Glove: Global Vectors for Word Representation. In EMNLP, 1532–1543. ACL.
- Towards VQA Models That Can Read. In CVPR, 8317–8326. Computer Vision Foundation / IEEE.
- Dual-Key Multimodal Backdoors for Visual Question Answering. In CVPR, 15354–15364. IEEE.
- SemAttack: Natural Textual Attacks via Different Semantic Spaces. In NAACL-HLT (Findings), 176–205. Association for Computational Linguistics.
- Skip connections matter: On the transferability of adversarial examples generated with resnets. arXiv preprint arXiv:2002.05990.
- Improving Transferability of Adversarial Examples With Input Diversity. In CVPR, 2730–2739. Computer Vision Foundation / IEEE.
- R&R: Metric-guided Adversarial Sentence Generation. In AACL/IJCNLP (Findings), 438–452. Association for Computational Linguistics.
- Fooling Vision and Language Models Despite Localization and Attention Mechanism. In CVPR, 4951–4961. Computer Vision Foundation / IEEE Computer Society.
- Vision-Language Pre-Training with Triple Contrastive Learning. In CVPR, 15650–15659. IEEE.
- LeapAttack: Hard-Label Adversarial Attack on Text via Gradient-Based Optimization. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2307–2315.
- TextHoaxer: budgeted hard-label adversarial attacks on text. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 3877–3884.
- A Visual Dialog Augmented Interactive Recommender System. In KDD, 157–165. ACM.
- Medical Visual Question Answering via Conditional Reasoning. In ACM Multimedia, 2345–2354. ACM.
- Towards Adversarial Attack on Vision-Language Pre-training Models. In ACM Multimedia, 5005–5013. ACM.
- Transferable Adversarial Perturbations. In ECCV (14), volume 11218 of Lecture Notes in Computer Science, 471–486. Springer.
- Ziyi Yin (28 papers)
- Muchao Ye (11 papers)
- Tianrong Zhang (5 papers)
- Jiaqi Wang (218 papers)
- Han Liu (340 papers)
- Jinghui Chen (50 papers)
- Ting Wang (213 papers)
- Fenglong Ma (66 papers)