Adversarial Examples are Misaligned in Diffusion Model Manifolds (2401.06637v5)
Abstract: In recent years, diffusion models (DMs) have drawn significant attention for their success in approximating data distributions, yielding state-of-the-art generative results. Nevertheless, the versatility of these models extends beyond their generative capabilities to encompass various vision applications, such as image inpainting, segmentation, adversarial robustness, among others. This study is dedicated to the investigation of adversarial attacks through the lens of diffusion models. However, our objective does not involve enhancing the adversarial robustness of image classifiers. Instead, our focus lies in utilizing the diffusion model to detect and analyze the anomalies introduced by these attacks on images. To that end, we systematically examine the alignment of the distributions of adversarial examples when subjected to the process of transformation using diffusion models. The efficacy of this approach is assessed across CIFAR-10 and ImageNet datasets, including varying image sizes in the latter. The results demonstrate a notable capacity to discriminate effectively between benign and attacked images, providing compelling evidence that adversarial instances do not align with the learned manifold of the DMs.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pages 274–283. PMLR, 2018a.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
- Shiyu Tang et al. Robustart: Benchmarking robustness on architecture design and training techniques. arXiv preprint arXiv:2109.05211, 2021.
- Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. arXiv preprint arXiv:1710.10766, 2017.
- Protecting classifiers against adversarial attacks using generative models. arxiv 2018. arXiv preprint arXiv:1805.06605, 1, 2018.
- Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344, 2018.
- Provable defenses against adversarial examples via the convex outer adversarial polytope. In International conference on machine learning, pages 5286–5295. PMLR, 2018.
- Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pages 1310–1320. PMLR, 2019.
- Beyond imagenet attack: Towards crafting adversarial examples for black-box domains. arXiv preprint arXiv:2201.11528, 2022.
- Diffusion models for imperceptible and transferable adversarial attack. arXiv preprint arXiv:2305.08192, 2023a.
- Synthesizing robust adversarial examples. In International conference on machine learning, pages 284–293. PMLR, 2018b.
- On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33:1633–1645, 2020.
- Benchmarking adversarial robustness on image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 321–323, 2020.
- Evaluating the adversarial robustness of adaptive test-time defenses. In International Conference on Machine Learning, pages 4421–4435. PMLR, 2022.
- Learning multiple layers of features from tiny images. arXiv, 2009.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Densepure: Understanding diffusion models towards adversarial robustness. arXiv preprint arXiv:2211.00322, 2022.
- {{\{{DiffSmooth}}\}}: Certifiably robust learning via diffusion models and local smoothing. In 32nd USENIX Security Symposium (USENIX Security 23), pages 4787–4804, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Towards the detection of diffusion model deepfakes. arXiv preprint arXiv:2210.14571, 2022.
- Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460, 2022.
- Robust classification via a single diffusion model. arXiv preprint arXiv:2305.15241, 2023b.
- Defending against adversarial attacks by leveraging an entire gan. arXiv preprint arXiv:1805.10652, 2018.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020a.
- Patchzero: Defending against adversarial patch attacks by detecting and zeroing the patch. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4632–4641, 2023.
- Minimally distorted adversarial examples with a fast adaptive boundary attack. In International Conference on Machine Learning, pages 2196–2205. PMLR, 2020b.
- Square attack: a query-efficient black-box adversarial attack via random search. In European conference on computer vision, pages 484–501. Springer, 2020.
- Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
- Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017a.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
- Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
- Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
- Black-box adversarial attacks with limited queries and information. In International conference on machine learning, pages 2137–2146. PMLR, 2018a.
- Prior convictions: Black-box adversarial attacks with bandits and priors. arXiv preprint arXiv:1807.07978, 2018b.
- Spectraldefense: Detecting adversarial attacks on cnns in the fourier domain. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
- Unfolding local growth rate estimates for (almost) perfect adversarial detection. arXiv preprint arXiv:2212.06776, 2022.
- Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv preprint arXiv:1801.02613, 2018.
- On the limitation of local intrinsic dimensionality for characterizing the subspaces of adversarial examples. arXiv preprint arXiv:1803.09638, 2018.
- Class-disentanglement and applications in adversarial detection and defense. Advances in Neural Information Processing Systems, 34:16051–16063, 2021.
- Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 3–14, 2017b.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
- Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946, 2021.
- Secure machine learning against adversarial samples at test time. EURASIP Journal on Information Security, 2022(1):1, 2022.
- Adversarial training and robustness for multiple perturbations. Advances in neural information processing systems, 32, 2019.
- Perceptual adversarial robustness: Defense against unseen threat models. arXiv preprint arXiv:2006.12655, 2020.
- A survey on transferability of adversarial examples across deep neural networks. arXiv preprint arXiv:2310.17626, 2023.
- Evading defenses to transferable adversarial examples by mitigating attention shift. 2018.
- Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
- Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1924–1933, 2021.
- Improving the transferability of adversarial examples via direction tuning. arXiv preprint arXiv:2303.15109, 2023a.
- Dynamic defenses and the transferability of adversarial examples. In 2022 IEEE 4th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA), pages 276–284. IEEE, 2022.
- Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys, 56(4):1–39, 2023b.
- Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638, 2023a.
- Dire for diffusion-generated image detection. arXiv preprint arXiv:2303.09295, 2023b.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Towards lightweight black-box attack against deep neural networks. Advances in Neural Information Processing Systems, 35:19319–19331, 2022.
- Aaron Xichen. Pytorch playground, 2017. URL https://github.com/aaron-xichen/pytorch-playground.
- Robust principles: Architectural design principles for adversarially robust cnns. arXiv preprint arXiv:2308.16258, 2023.
- Visual prompting for adversarial robustness. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023c.
- Attacking adversarial attacks as a defense. arXiv preprint arXiv:2106.04938, 2021.
- Aid-purifier: A light auxiliary network for boosting adversarial defense. Neurocomputing, 541:126251, 2023.
- Adversarial attacks are reversible with natural supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 661–671, 2021.
- Detecting autoattack perturbations in the frequency domain. arXiv preprint arXiv:2111.08785, 2021.
- A comprehensive study on robustness of image classification models: Benchmarking and rethinking. arXiv preprint arXiv:2302.14301, 2023.
- Robust evaluation of diffusion-based adversarial purification. arXiv preprint arXiv:2303.09051, 2023.
- Peter Lorenz (10 papers)
- Ricard Durall (17 papers)
- Janis Keuper (66 papers)