Instruct2Attack: Language-Guided Semantic Adversarial Attacks
Abstract: We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. Compared to existing noise-based and semantic attacks, I2A generates more natural and diverse adversarial examples while providing better controllability and interpretability. We further automate the attack process with GPT-4 to generate diverse image-specific text instructions. We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses, and demonstrate great transferability among a variety of network architectures.
- Image2StyleGAN: How to embed images into the StyleGAN latent space? In ICCV, 2019.
- Image2StyleGAN++: How to edit the embedded images? In CVPR, 2020.
- Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 2018.
- HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing. In CVPR, 2022.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICLR, 2018.
- Blended diffusion for text-driven editing of natural images. In CVPR, 2022.
- Unrestricted adversarial examples via semantic manipulation. In ICLR, 2020.
- InstructPix2Pix: Learning to Follow Image Editing Instructions. CVPR, 2023.
- Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
- Using latent space regression to analyze and leverage compositionality in gans. In ICLR, 2021.
- Diffusion models for imperceptible and transferable adversarial attack. arXiv preprint arXiv:2305.08192, 2023.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, 2020.
- Robustbench: a standardized adversarial robustness benchmark. In NeurIPS, 2021.
- VQGAN-CLIP: Open domain image generation and editing with natural language guidance. In ECCV, 2022.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- Boosting adversarial attacks with momentum. In CVPR, 2018.
- Viewfool: Evaluating the robustness of visual recognition to adversarial viewpoints. In NeurIPS, 2022.
- Learning perceptually-aligned representations via adversarial robustness. In ArXiv preprint arXiv:1906.00945, 2019a.
- A rotation and a translation suffice: Fooling CNNs with simple transformations. 2019b.
- StyleGAN-NADA: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
- Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR, 2013.
- Generative adversarial nets. NeurIPS, 2014.
- Explaining and harnessing adversarial examples. In ICLR, 2015.
- Deep residual learning for image recognition. In CVPR, 2016.
- Prompt-to-Prompt Image Editing with Cross-Attention Control. In ICLR, 2023.
- GANs trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Denoising diffusion probabilistic models. NeurIPS, 2020.
- Semantic adversarial examples. In CVPRW, 2018.
- Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
- The robust manifold defense: Adversarial training using generative models. arXiv preprint arXiv:1712.09196, 2017.
- Semantic adversarial attacks: Parametric transformations that fool deep classifiers. In ICCV, 2019.
- Testing robustness against unforeseen adversaries. arXiv preprint arXiv:1908.08016, 2019.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
- DiffusionCLIP: Text-guided diffusion models for robust image manipulation. In CVPR, 2022.
- Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
- Functional adversarial attacks. NeurIPS, 2019.
- Perceptual adversarial robustness: Defense against unseen threat models. In ICLR, 2021.
- Attribute-guided encryption with facial texture masking. arXiv preprint arXiv:2305.13548, 2023a.
- Adversarial attacks and robust defenses in deep learning. In Deep Learning, pages 29–58. Elsevier BV, 2023b.
- Interpolated joint space adversarial training for robust and generalizable defenses. PAMI, 2023c.
- BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In ICML, 2023.
- Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks. In NeurIPS, 2020.
- EditGAN: High-precision semantic image editing. In NeurIPS, 2021.
- A comprehensive study on robustness of image classification models: Benchmarking and rethinking. arXiv preprint arXiv:2302.14301, 2023a.
- Beyond pixel norm-balls: Parametric adversaries using an analytically differentiable renderer. In ICLR, 2018.
- Mutual adversarial training: Learning together is better than going alone. IEEE Transactions on Information Forensics and Security, 2022a.
- Segment and complete: Defending object detectors against adversarial patch attacks with robust patch detection. In CVPR, 2022b.
- Diffprotect: Generate adversarial examples with diffusion models for facial privacy protection. arXiv preprint arXiv:2305.13625, 2023b.
- Swin transformer: Hierarchical vision transformer using shifted windows. In CVPR, 2021.
- A ConvNet for the 2020s. In CVPR, 2022c.
- Context-consistent semantic image editing with style-preserved modulation. In ECCV, 2022.
- SIEDOB: Semantic Image Editing by Disentangling Object and Background. In CVPR, 2023.
- Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
- Improved denoising diffusion probabilistic models. In ICML, 2021.
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. In ICML, 2022.
- Diffusion models for adversarial purification. In ICML, 2022.
- Sesame: Semantic editing of scenes by adding, manipulating or erasing objects. In ECCV, 2020.
- OpenAI. GPT-4 technical report. arXiv, pages 2303–08774, 2023.
- Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019.
- Semanticadv: Generating adversarial examples via attribute-conditioned image editing. In ECCV, 2020.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Understanding and mitigating the tradeoff between robustness and accuracy. In ICML, 2020.
- Encoding in style: a stylegan encoder for image-to-image translation. In CVPR, 2021.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
- Do adversarially robust ImageNet models transfer better? In NeurIPS, 2020.
- CLIP2Protect: Protecting Facial Privacy using Text-Guided Makeup via Adversarial Latent Search. In CVPR, 2023.
- Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models. In NeurIPS, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
- Score-based generative modeling through stochastic differential equations. In ICLR, 2021.
- Disentangling adversarial robustness and generalization. In CVPR, 2019.
- Intriguing properties of neural networks. ICLR, 2014.
- Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation. In CVPR, 2020.
- On adaptive attacks to adversarial example defenses. In NeurIPS, 2020.
- Attention is all you need. NeurIPS, 2017.
- Semantic adversarial attacks via diffusion models. arXiv preprint arXiv:2309.07398, 2023.
- Fast is better than free: Revisiting adversarial training. In ICLR, 2020.
- Spatially transformed adversarial examples. In ICLR, 2018.
- Can you fool AT with adversarial examples on a visual Turing test. arXiv preprint arXiv:1709.08693, 3, 2017.
- Diffusion-based adversarial sample generation for improved stealthiness and controllability. arXiv preprint arXiv:2305.16494, 2023.
- Adversarial purification with score-based generative models. In ICML, 2021.
- Generative image inpainting with contextual attention. In CVPR, 2018.
- Free-form image inpainting with gated convolution. In ICCV, 2019.
- Semantic perturbations with normalizing flows for improved generalization. In ICCV, 2021.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428, 2021.
- Towards large yet imperceptible adversarial image perturbations with perceptual color distance. In CVPR, 2020.
- Places: A 10 million image database for scene recognition. PAMI, 2017.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.