Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models (2308.15692v2)
Abstract: Denoising probabilistic diffusion models have shown breakthrough performance to generate more photo-realistic images or human-level illustrations than the prior models such as GANs. This high image-generation capability has stimulated the creation of many downstream applications in various areas. However, we find that this technology is actually a double-edged sword: We identify a new type of attack, called the Natural Denoising Diffusion (NDD) attack based on the finding that state-of-the-art deep neural network (DNN) models still hold their prediction even if we intentionally remove their robust features, which are essential to the human visual system (HVS), through text prompts. The NDD attack shows a significantly high capability to generate low-cost, model-agnostic, and transferable adversarial attacks by exploiting the natural attack capability in diffusion models. To systematically evaluate the risk of the NDD attack, we perform a large-scale empirical study with our newly created dataset, the Natural Denoising Diffusion Attack (NDDA) dataset. We evaluate the natural attack capability by answering 6 research questions. Through a user study, we find that it can achieve an 88% detection rate while being stealthy to 93% of human subjects; we also find that the non-robust features embedded by diffusion models contribute to the natural attack capability. To confirm the model-agnostic and transferable attack capability, we perform the NDD attack against the Tesla Model 3 and find that 73% of the physically printed attacks can be detected as stop signs. Our hope is that the study and dataset can help our community be aware of the risks in diffusion models and facilitate further research toward robust DNN models.
- Prolific. https://researcher-help.prolific.co/hc/en-gb, 2014.
- MidJourney. https://www.midjourney.com/, 2023.
- Our User Study Form: AI Images’ Realism Survey. https://drive.google.com/file/d/1ifbOT8YE-iHfLvss_h4nzNsq6PZwOIwo/view?usp=drive_link, 2023.
- Our Proejct Website. https://sites.google.com/view/cav-sec/ndd-attack, 2024.
- Adobe. Adobe Firefly - Generative AI for creatives. https://www.adobe.com/sensei/generative-ai/firefly.html, 2023.
- Adverb. BigSleep. https://github.com/lucidrains/big-sleep, 2021.
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In International Conference on Machine Learning (ICML), 2018.
- AI-GAN: Attack-Inspired Generation of Adversarial Examples. In 2021 IEEE International Conference on Image Processing (ICIP), pages 2543–2547. IEEE, 2021.
- Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations (ICLR), 2019.
- Invisible for Both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving under Physical-World Attacks. In 2021 IEEE Symposium on Security and Privacy (SP), pages 176–194. IEEE, 2021.
- End-to-End Object Detection with Transformers. In European Conference on Computer Vision (ECCV), pages 213–229. Springer, 2020.
- Towards Evaluating the Robustness of Neural Networks. In IEEE symposium on security and privacy (SP), pages 39–57. IEEE, 2017.
- On Evaluating Adversarial Robustness. arXiv preprint arXiv:1902.06705, 2019.
- Extracting Training Data from Diffusion Models. arXiv preprint arXiv:2301.13188, 2023.
- Diffusion Models for Imperceptible and Transferable Adversarial Attack. arXiv preprint arXiv:2305.08192, 2023a.
- MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv preprint arXiv:1906.07155, 2019.
- AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 4562–4572, 2023b.
- Sign-OPT: A Query-Efficient Hard-label Adversarial Attack. In International Conference on Learning Representations (ICLR), 2020.
- DALL-E-2. DALL-E-2. https://openai.com/dall-e-2, 2022.
- DALL-E-3. DALL-E-3. https://openai.com/dall-e-3, 2023.
- Diffusion Models Beat GANs on Image Synthesis, 2021.
- Robust Physical-World Attacks on Deep Learning Visual Classification. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Contributions of Shape, Texture, and Color in Visual Recognition. In European Conference on Computer Vision (ECCV), pages 369–386. Springer, 2022.
- Explaining and Harnessing Adversarial Examples. In ICLR, 2015.
- Google. Introducing Duet AI for Google Workspace. https://workspace.google.com/blog/product-announcements/duet-ai, 2023.
- The Human Visual Cortex. Annu. Rev. Neurosci., 27:649–677, 2004.
- Deep Residual Learning for Image Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- Natural Adversarial Examples. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15262–15271, 2021.
- Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems (NeurIPS), 33:6840–6851, 2020.
- Densely Connected Convolutional Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4700–4708, 2017.
- OpenCLIP. https://doi.org/10.5281/zenodo.5143773, 2021.
- Adversarial Examples Are Not Bugs, They Are Features. Advances in Neural Information Processing Systems (NeurIPS), 2019.
- DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception. arXiv preprint arXiv:2303.08333, 2023.
- Glenn Jocher. YOLOv5 by Ultralytics. https://github.com/ultralytics/yolov5, 2020.
- Ultralytics YOLO. https://github.com/ultralytics/ultralytics, 2023.
- Diffusion Models in Medical Imaging: A Comprehensive Survey. Medical Image Analysis, page 102846, 2023.
- Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images, 2009.
- MNIST Handwritten Digit Database. http://yann.lecun.com/exdb/mnist/, 2010.
- Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV), pages 740–755, 2014.
- RTMDet: An Empirical Study of Designing Real-Time Object Detectors. arXiv preprint arXiv:2212.07784, 2022.
- SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples. AAAI, 2024.
- Towards Deep Learning Models Resistant to Adversarial Attacks. In International Conference on Learning Representation (ICLR), 2018.
- Fausto Morales. A Packaged and Flexible Version of the CRAFT Text Detector and Keras CRNN Recognition Model. https://github.com/faustomorales/keras-ocr, 2022.
- Tim Pearce. Conditional Diffusion MNIST. https://github.com/TeaPearce/Conditional_Diffusion_MNIST, 2022.
- Learning Transferable Visual Models from Natural Language Supervision. In International Conference on Machine Learning (ICML), pages 8748–8763, 2021.
- Hierarchical Text-Conditional Image Generation with Clip Latents. arXiv preprint arXiv:2204.06125, 2022.
- YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767, 2018.
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In International Conference on Neural Information Processing Systems (NIPS), 2015.
- High-Resolution Image Synthesis with Latent Diffusion Models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Dirty Road Can Attack: Security of Deep Learning based Automated Lane Centering under Physical-World Attack. In USENIX Security Symposium, 2021.
- Sok: On the Semantic AI Security in Autonomous Driving. arXiv preprint arXiv:2203.05314, 2022.
- StabilityAI. DeepFloyd IF. https://github.com/deep-floyd/IF, 2023.
- Intriguing Properties of Neural Networks. In International Conference on Learning Representations (ICLR), 2014.
- Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning (ICML), pages 6105–6114, 2019.
- On Adaptive Attacks to Adversarial Example Defenses. arXiv preprint arXiv:2002.08347, 2020.
- Does Physical Adversarial Example Really Matter to Autonomous Driving? Towards System-Level Effect of Adversarial Object Evasion Attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4412–4423, 2023.
- Fashion-Mnist: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Aggregated Residual Transformations for Deep Neural Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1492–1500, 2017.
- Certified Defenses for Adversarial Patches. In International Conference on Learning Representations (ICLR), 2020.
- Adding Conditional Control to Text-to-Image Diffusion Models. arXiv preprint arXiv:2302.05543, 2023.
- Interpretable Deep Learning under Fire. In USENIX Security Symposium, 2020.
- Seeing isn’t Believing: Practical Adversarial Attack Against Object Detectors. In ACM SIGSAC Conference on Computer and Communications Security (CCS), 2019.