Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diffusion Models for Imperceptible and Transferable Adversarial Attack (2305.08192v2)

Published 14 May 2023 in cs.CV

Abstract: Many existing adversarial attacks generate $L_p$-norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without $L_p$-norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in the latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further "deceive" the diffusion model which can be viewed as an implicit recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, DiffAttack, is the first that introduces diffusion models into the adversarial attack field. Extensive experiments on various model structures, datasets, and defense methods have demonstrated the superiority of our attack over the existing attack methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018.
  2. Contrastive learning for fine-grained ship classification in remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–16, 2022a.
  3. A degraded reconstruction enhancement-based method for tiny ship detection in remote sensing images with a new large-scale dataset. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022b.
  4. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pp.  15–26, 2017.
  5. Text-to-image diffusion models are zero-shot classifiers. In ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
  6. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
  7. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359, 2022.
  8. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  9. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  9185–9193, 2018.
  10. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4312–4321, 2019.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  12. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615(7953):620–627, 2023.
  13. Patch-wise attack for fooling deep neural network. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16, pp.  307–322. Springer, 2020.
  14. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  16. Transferable sparse adversarial attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14963–14972, 2022.
  17. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  18. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  19. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  20. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  7132–7141, 2018.
  21. Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7066–7074, 2019.
  22. Certified robustness for top-k predictions against adversarial perturbations via randomized smoothing. In International Conference on Learning Representations, 2020.
  23. Adv-attribute: Inconspicuous and transferable adversarial attack on face recognition. In Advances in Neural Information Processing Systems, 2022.
  24. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pp.  694–711. Springer, 2016.
  25. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pp.  554–561, 2013.
  26. Adversarial machine learning at scale. In International Conference on Learning Representations, 2017.
  27. Adversarial attacks and defences competition. In The NIPS’17 Competition: Building Intelligent Systems, pp.  195–231. Springer, 2018.
  28. Sdm: Spatial diffusion model for large hole image inpainting. arXiv preprint arXiv:2212.02963, 2022.
  29. Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1778–1787, 2018.
  30. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2020.
  31. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  10012–10022, 2021.
  32. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11976–11986, 2022.
  33. Frequency domain model augmentation for adversarial attack. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp.  549–566. Springer, 2022.
  34. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019.
  35. Enhancing cross-task black-box transferability of adversarial examples with dispersion reduction. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp.  940–949, 2020.
  36. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
  37. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2021.
  38. Null-text inversion for editing real images using guided diffusion models. arXiv preprint arXiv:2211.09794, 2022.
  39. Simple black-box adversarial perturbations for deep networks. arXiv preprint arXiv:1612.06299, 2016.
  40. Cross-domain transferability of adversarial perturbations. Advances in Neural Information Processing Systems, 32, 2019.
  41. A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  262–271, 2020.
  42. Diffusion models for adversarial purification. In International Conference on Machine Learning, pp.  16805–16827. PMLR, 2022.
  43. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
  44. Zero-shot image-to-image translation. arXiv preprint arXiv:2302.03027, 2023.
  45. Generative adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4422–4431, 2018.
  46. Semanticadv: Generating adversarial examples via attribute-conditioned image editing. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pp.  19–37. Springer, 2020.
  47. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  48. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  49. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  50. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022a.
  51. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022b.
  52. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4510–4520, 2018.
  53. On the effectiveness of low frequency perturbations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp.  3389–3396, 2019.
  54. Matching local self-similarities across images and videos. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.  1–8. IEEE, 2007.
  55. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  56. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.  2256–2265. PMLR, 2015.
  57. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
  58. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  59. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  2818–2826, 2016.
  60. Defense against adversarial attacks-3rd place. https://github.com/anlthms/nips-2017/blob/master/poster/defense.pdf, 2017.
  61. Mlp-mixer: An all-mlp architecture for vision. Advances in neural information processing systems, 34:24261–24272, 2021.
  62. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp.  10347–10357. PMLR, 2021.
  63. Ensemble adversarial training: Attacks and defenses. In 6th International Conference on Learning Representations, ICLR 2018-Conference Track Proceedings, 2018.
  64. Splicing vit features for semantic appearance transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10748–10757, 2022.
  65. Efficient diffusion models for vision: A survey. arXiv preprint arXiv:2210.09292, 2022.
  66. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  67. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1924–1933, 2021.
  68. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
  69. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2730–2739, 2019.
  70. Smartbrush: Text and shape guided object inpainting with diffusion model. arXiv preprint arXiv:2212.05034, 2022.
  71. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14983–14992, 2022.
  72. Open-vocabulary panoptic segmentation with text-to-image diffusion models. arXiv preprint arXiv:2303.04803, 2023.
  73. Natural color fool: Towards boosting black-box unrestricted attacks. In Advances in Neural Information Processing Systems, 2022.
  74. Beyond imagenet attack: Towards crafting adversarial examples for black-box domains. In International Conference on Learning Representations, 2022a.
  75. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  586–595, 2018.
  76. Dermoscopic image retrieval based on rotation-invariance deep hashing. Medical Image Analysis, 77:102301, 2022b.
  77. Tformer: A throughout fusion transformer for multi-modal skin lesion diagnosis. Computers in Biology and Medicine, pp.  106712, 2023.
  78. Towards large yet imperceptible adversarial image perturbations with perceptual color distance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1039–1048, 2020.
  79. Object recognition with and without objects. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp.  3609–3615, 2017.
  80. Real-time full-stack traffic scene perception for autonomous driving with roadside cameras. In 2022 International Conference on Robotics and Automation (ICRA), pp.  890–896. IEEE, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jianqi Chen (18 papers)
  2. Hao Chen (1006 papers)
  3. Keyan Chen (34 papers)
  4. Yilan Zhang (10 papers)
  5. Zhengxia Zou (52 papers)
  6. Zhenwei Shi (77 papers)
Citations (32)

Summary

Insights into "Diffusion Models for Imperceptible and Transferable Adversarial Attack"

The paper "Diffusion Models for Imperceptible and Transferable Adversarial Attack" introduces a novel approach to adversarial attacks by leveraging the inherent generative and discriminative capabilities of diffusion models. The proposed method, termed DiffAttack, seeks to address the perennial trade-off in adversarial machine learning between attack imperceptibility and transferability. The work stands out by taking a departure from the conventional LpL_p-norm-based methodologies, which have historically stuck to direct pixel manipulation, and instead initiates perturbations within the latent space of diffusion models.

Methodological Advancements

DiffAttack's framework capitalizes on the intrinsic characteristics of diffusion models, originally conceptualized for image synthesis, to generate perturbations that maintain human invisibility while encapsulating robust semantic content. This approach leverages two crucial properties of diffusion models: their ability to produce visually natural outputs and their implicit recognition ability when trained on extensive datasets, which aids transfer-based attacks across varied models and defenses.

The method involves manipulating the latent space rather than directly altering pixel values. This manipulation is complemented by content-preserving structures ensuring that the resultant adversarial examples maintain the initial semantic intent of the source images. Additionally, through the minimization of variance in cross-attention maps between text and image pixels, DiffAttack can "deceive" the diffusion model into becoming an implicit surrogate, thus improving the transferability of the attack across black-box models.

Experimental Insights

The research demonstrates DiffAttack's superiority through comprehensive evaluations across a broad spectrum of network architectures, including CNNs, Transformers, and MLPs, using datasets such as ImageNet-Compatible, CUB-200-2011, and Stanford Cars. The empirical results indicate that DiffAttack consistently outperforms traditional pixel-based and other unrestricted attack approaches in both transferability and human imperceptibility, as quantified by top-1 accuracy and FID scores. Notably, while the diffusion-based approach shows a decrease in white-box attack success compared to pixel-based attacks, its robustness against various defenses underscores its practicality and the nuanced effectiveness of its latent space attacks.

Theoretical and Practical Implications

DiffAttack introduces a significant methodological shift by incorporating diffusion models into adversarial attacks, which opens a pathway to profoundly rethink attack strategies and defense mechanisms in machine learning systems. This paper's findings imply a multidimensional expansion of the attack surface that goes beyond traditional pixel perturbations, embracing more abstract manipulations that resonate with both high-fidelity image representation and cross-model applicability.

The research suggests intriguing future investigations into diffusion models and their role in adversarial machine learning. By advancing adversarial perturbation techniques that are less perceptible yet more transferable, this paper challenges the current understanding and opens the possibility for more resilient AI systems capable of defending against sophisticated macro-level attacks.

In sum, the methodological framework laid out in this paper not only stretches the boundaries of adversarial attack strategies but also signifies a burgeoning trend where synthesis-centric models like diffusion models are increasingly being exploited for adversarial applications, thus calling for heightened vigilance in AI safety and resilience discussions.

X Twitter Logo Streamline Icon: https://streamlinehq.com