Generalization Properties of Adversarial Training for $\ell_0$-Bounded Adversarial Attacks (2402.03576v1)
Abstract: We have widely observed that neural networks are vulnerable to small additive perturbations to the input causing misclassification. In this paper, we focus on the $\ell_0$-bounded adversarial attacks, and aim to theoretically characterize the performance of adversarial training for an important class of truncated classifiers. Such classifiers are shown to have strong performance empirically, as well as theoretically in the Gaussian mixture model, in the $\ell_0$-adversarial setting. The main contribution of this paper is to prove a novel generalization bound for the binary classification setting with $\ell_0$-bounded adversarial perturbation that is distribution-independent. Deriving a generalization bound in this setting has two main challenges: (i) the truncated inner product which is highly non-linear; and (ii) maximization over the $\ell_0$ ball due to adversarial training is non-convex and highly non-smooth. To tackle these challenges, we develop new coding techniques for bounding the combinatorial dimension of the truncated hypothesis class.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML, Stockholm, Sweden, July 10-15, pages 274–283, 2018.
- On the rademacher complexity of linear hypothesis sets. arXiv preprint arXiv:2007.11045, 2020.
- Improved generalization bounds for robust learning. In Algorithmic Learning Theory, pages 162–183. PMLR, 2019.
- Consistent non-parametric methods for adaptive robustness. arXiv preprint arXiv:2102.09086, 2021.
- Evasion attacks against machine learning at test time. In Joint European conference on machine learning and knowledge discovery in databases, pages 387–402. Springer, 2013.
- Efficient and robust classification for sparse attacks. arXiv preprint arXiv:2201.09369, 2022.
- Sample complexity of adversarially robust linear classification on separated data. arXiv preprint arXiv:2012.10794, 2020.
- Sparse-rs: a versatile framework for query-efficient sparse black-box adversarial attacks. arXiv preprint arXiv:2006.12834, 2020.
- Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy, San Jose, CA, USA, May 22-26,, pages 39–57, 2017.
- Robust classification under ℓ0subscriptℓ0\ell_{0}roman_ℓ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT attack for the gaussian mixture model. arXiv preprint arXiv:2104.02189, 2021.
- Learning and inference in the presence of corrupted inputs. In Conference on Learning Theory, pages 637–657. PMLR, 2015.
- Adversarial perturbations against deep neural networks for malware classification. arXiv preprint arXiv:1606.04435, 2016.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Is bert really robust? natural language attack on text classification and entailment. arXiv preprint arXiv:1907.11932, 2, 2019.
- Adversarial risk bounds via function transformation. arXiv preprint arXiv:1810.09519, 2018.
- Robustness certificates for sparse adversarial attacks by randomized ablation. In AAAI, pages 4585–4593, 2020.
- Adversarial camera stickers: A physical camera-based attack on deep learning systems. In International Conference on Machine Learning, pages 3896–3904. PMLR, 2019.
- Vc classes are adversarially robustly learnable, but only improperly. In Conference on Learning Theory, pages 2512–2530. PMLR, 2019.
- Sparsefool: a few pixels make a big difference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9087–9096, 2019.
- Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Robustness to adversarial perturbations in learning from incomplete data. Advances in Neural Information Processing Systems, 32, 2019.
- The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P), pages 372–387. IEEE, 2016.
- Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), pages 582–597. IEEE, 2016.
- Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344, 2018.
- Adversarial training can hurt generalization, 2019.
- Towards the first adversarially robust neural network model on mnist. arXiv preprint arXiv:1805.09190, 2018.
- Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- A simple explanation for the existence of adversarial examples with small hamming distance. arXiv preprint arXiv:1901.10861, 2019.
- Adversarially robust generalization requires more data. Advances in neural information processing systems, 31, 2018.
- Intriguing properties of neural networks. In International Conference on Learning Representations, 2014, Banff, AB, Canada, April 14-16, 2014.
- On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity, pages 11–30. Springer, 2015.
- Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning, pages 5286–5295. PMLR, 2018.
- On the generalization properties of adversarial training. In International Conference on Artificial Intelligence and Statistics, pages 505–513. PMLR, 2021.
- Rademacher complexity for adversarially robust generalization. In International conference on machine learning, pages 7085–7094. PMLR, 2019.