Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability (2403.03967v2)
Abstract: The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.
- Adler, R. J. and Taylor, J. E. (2007), “Gaussian inequalities,” Random Fields and Geometry, 49–64.
- Bartlett, P., Bubeck, S., and Cherapanamjeri, Y. (2021), “Adversarial examples in multi-layer random relu networks,” Advances in Neural Information Processing Systems, 34, 9241–9252.
- Bubeck, S., Cherapanamjeri, Y., Gidel, G., and Tachet des Combes, R. (2021), “A single gradient step finds adversarial examples on random two-layers neural networks,” Advances in Neural Information Processing Systems, 34, 10081–10091.
- Carlini, N. and Wagner, D. (2017), “Towards evaluating the robustness of neural networks,” in 2017 ieee symposium on security and privacy (sp), Ieee, pp. 39–57.
- Daniely, A. and Shacham, H. (2020), “Most ReLU Networks Suffer from \ell^2 Adversarial Perturbations,” in Advances in Neural Information Processing Systems, eds. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., Curran Associates, Inc., vol. 33, pp. 6629–6636.
- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009), “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255.
- Facco, E., d’Errico, M., Rodriguez, A., and Laio, A. (2017), “Estimating the intrinsic dimension of datasets by a minimal neighborhood information,” Scientific Reports, 7.
- Frei, S., Vardi, G., Bartlett, P. L., and Srebro, N. (2023), “The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks,” arXiv preprint arXiv:2303.01456.
- Fukunaga, K. and Olsen, D. (1971), “An Algorithm for Finding Intrinsic Dimensionality of Data,” IEEE Transactions on Computers, C-20, 176–183.
- Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014), “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572.
- Haro, G., Randall, G., and Sapiro, G. (2008), “Translated Poisson Mixture Model for Stratification Learning,” International Journal of Computer Vision, 80, 358–374.
- He, K., Zhang, X., Ren, S., and Sun, J. (2015), “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034.
- — (2016), “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Hornik, K., Stinchcombe, M., and White, H. (1989), “Multilayer feedforward networks are universal approximators,” Neural networks, 2, 359–366.
- Laurent, B. and Massart, P. (2000), “Adaptive estimation of a quadratic functional by model selection,” Annals of statistics, 1302–1338.
- LeCun, Y., Cortes, C., and Burges, C. (2010), “MNIST handwritten digit database,” ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2.
- Ledoux, M. (2006), “Isoperimetry and Gaussian analysis,” Lectures on Probability Theory and Statistics: Ecole d’Eté de Probabilités de Saint-Flour XXIV—1994, 165–294.
- Leshno, M., Lin, V. Y., Pinkus, A., and Schocken, S. (1993), “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural networks, 6, 861–867.
- Liu, J., Bai, Y., Jiang, G., Chen, T., and Wang, H. (2019), “Understanding Why Neural Networks Generalize Well Through GSNR of Parameters,” in International Conference on Learning Representations.
- Lyu, K. and Li, J. (2020), “Gradient Descent Maximizes the Margin of Homogeneous Neural Networks,” .
- Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2017), “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083.
- Melamed, O., Yehudai, G., and Vardi, G. (2023), “Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Data Manifolds,” arXiv preprint arXiv:2303.00783.
- Pinkus, A. (1999), “Approximation theory of the MLP model in neural networks,” Acta numerica, 8, 143–195.
- Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K., and Madry, A. (2018), “Adversarially robust generalization requires more data,” Advances in neural information processing systems, 31.
- Shamir, A., Melamed, O., and BenShmuel, O. (2022), “The Dimpled Manifold Model of Adversarial Examples in Machine Learning,” .
- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013), “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199.
- Vardi, G., Yehudai, G., and Shamir, O. (2022), “Gradient methods provably converge to non-robust networks,” Advances in Neural Information Processing Systems, 35, 20921–20932.
- Xiao, H., Rasul, K., and Vollgraf, R. (2017), “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” .
- Xiao, J., Yang, L., Fan, Y., Wang, J., and Luo, Z.-Q. (2022), “Understanding adversarial robustness against on-manifold adversarial examples,” arXiv preprint arXiv:2210.00430.
- Zhang, W., Zhang, Y., Hu, X., Goswami, M., Chen, C., and Metaxas, D. N. (2022), “A Manifold View of Adversarial Risk,” in International Conference on Artificial Intelligence and Statistics, PMLR, pp. 11598–11614.