Investigating the Adversarial Robustness of Density Estimation Using the Probability Flow ODE (2310.07084v1)
Abstract: Beyond their impressive sampling capabilities, score-based diffusion models offer a powerful analysis tool in the form of unbiased density estimation of a query sample under the training data distribution. In this work, we investigate the robustness of density estimation using the probability flow (PF) neural ordinary differential equation (ODE) model against gradient-based likelihood maximization attacks and the relation to sample complexity, where the compressed size of a sample is used as a measure of its complexity. We introduce and evaluate six gradient-based log-likelihood maximization attacks, including a novel reverse integration attack. Our experimental evaluations on CIFAR-10 show that density estimation using the PF ODE is robust against high-complexity, high-likelihood attacks, and that in some cases adversarial samples are semantically meaningful, as expected from a robust estimator.
- Adversarial robustness on in-and out-distribution improves explainability. In European Conference on Computer Vision, pages 228–245. Springer, 2020.
- Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 3–14, 2017.
- On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
- (certified!!) adversarial robustness for free! In The Eleventh International Conference on Learning Representations, 2022.
- Robust feature-level adversaries are interpretability tools. Advances in Neural Information Processing Systems, 35:33093–33106, 2022.
- Robust classification via a single diffusion model. arXiv preprint arXiv:2305.15241, 2023.
- Ricky T. Q. Chen. https://github.com/rtqichen/torchdiffeq, 2018. URL https://github.com/rtqichen/torchdiffeq.
- Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- Improving diffusion models for inverse problems using manifold constraints. Advances in Neural Information Processing Systems, 35:25683–25696, 2022.
- On the limitations of stochastic pre-processing defenses. Advances in Neural Information Processing Systems, 35:24280–24294, 2022.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Your classifier is secretly an energy based model and you should treat it like one. arXiv preprint arXiv:1912.03263, 2019.
- Card: Classification and regression diffusion models. Advances in Neural Information Processing Systems, 35:18100–18115, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Adversarial robustness of stabilized neural ode might be from obfuscated gradients. In Mathematical and Scientific Machine Learning, pages 497–515. PMLR, 2022.
- Adversarial examples are not bugs, they are features. Advances in neural information processing systems, 32, 2019.
- Robust compressed sensing mri with deep generative priors. Advances in Neural Information Processing Systems, 34:14938–14954, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
- Robust evaluation of diffusion-based adversarial purification. arXiv preprint arXiv:2303.09051, 2023.
- Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022.
- Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
- Bayesian mri reconstruction with joint uncertainty estimation using diffusion models. Magnetic Resonance in Medicine, 90(1):295–311, 2023.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Interacting particle solutions of fokker–planck equations through gradient–log–density estimation. Entropy, 22(8):802, 2020.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
- Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460, 2022.
- Likelihood ratios for out-of-distribution detection. Advances in neural information processing systems, 32, 2019.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Positive difference distribution for image outlier detection using normalizing flows and contrastive data. Transactions on Machine Learning Research, 2023.
- Towards the first adversarially robust neural network model on mnist. arXiv preprint arXiv:1805.09190, 2018.
- Input complexity and out-of-distribution detection with likelihood-based generative models. arXiv preprint arXiv:1909.11480, 2019.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021a.
- Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005, 2021b.
- On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33:1633–1645, 2020.
- Further analysis of outlier detection with deep generative models. Advances in Neural Information Processing Systems, 33:8982–8992, 2020.
- Adversarial counterfactual learning and evaluation for recommender system. Advances in Neural Information Processing Systems, 33:13515–13526, 2020.
- Adversarial purification with score-based generative models. In International Conference on Machine Learning, pages 12062–12072. PMLR, 2021.
- {{\{{DiffSmooth}}\}}: Certifiably robust learning via diffusion models and local smoothing. In 32nd USENIX Security Symposium (USENIX Security 23), pages 4787–4804, 2023.