Adversarial Training Should Be Cast as a Non-Zero-Sum Game (2306.11035v2)
Abstract: One prominent approach toward resolving the adversarial vulnerability of deep neural networks is the two-player zero-sum paradigm of adversarial training, in which predictors are trained against adversarially chosen perturbations of data. Despite the promise of this approach, algorithms based on this paradigm have not engendered sufficient levels of robustness and suffer from pathological behavior like robust overfitting. To understand this shortcoming, we first show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on the robustness of trained classifiers. The identification of this pitfall informs a novel non-zero-sum bilevel formulation of adversarial training, wherein each player optimizes a different objective function. Our formulation yields a simple algorithmic framework that matches and in some cases outperforms state-of-the-art attacks, attains comparable levels of robustness to standard adversarial training algorithms, and does not suffer from robust overfitting.
- Model-based robust deep learning: Generalizing to natural, out-of-distribution data. arXiv preprint arXiv:2005.10247, 2020.
- Perceptual adversarial robustness: Defense against unseen threat models. arXiv preprint arXiv:2006.12655, 2020.
- Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349, 2021.
- Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1625–1634, 2018.
- Breeds: Benchmarks for subpopulation shift. International Conference on Learning Representations, 2021.
- No subclass left behind: Fine-grained robustness in coarse-grained classification problems. Advances in Neural Information Processing Systems, 33:19339–19352, 2020.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021.
- Do deep networks transfer invariances across classes? arXiv preprint arXiv:2203.09739, 2022.
- Noise or signal: The role of image backgrounds in object recognition. International Conference on Machine Learning, 2021.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Probable domain generalization via quantile risk minimization. arXiv preprint arXiv:2207.09944, 2022.
- An investigation of why overparameterization exacerbates spurious correlations. In International Conference on Machine Learning, pages 8346–8356. PMLR, 2020.
- Model-based domain generalization. Advances in Neural Information Processing Systems, 34:20210–20229, 2021a.
- Intriguing properties of neural networks. In ICLR, 2013.
- Evasion attacks against machine learning at test time. In ECML/PKKD, 2013a.
- Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389, 2012.
- Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017.
- Evasion attacks against machine learning at test time. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, pages 387–402. Springer, 2013b.
- Learning with a strong adversary. ArXiv, abs/1511.03034, 2015.
- Provable defenses against adversarial examples via the convex outer adversarial polytope. ICML, 2018.
- Adversarial examples in the physical world. ICLR Workshop, 2017. URL https://openreview.net/forum?id=HJGU3Rodl.
- Adversarial robustness with semi-infinite constrained learning. Advances in Neural Information Processing Systems, 34:6198–6215, 2021b.
- Improving neural network robustness via persistency of excitation. In 2022 American Control Conference (ACC), pages 1521–1526. IEEE, 2022.
- Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
- Explaining and harnessing adversarial examples. In ICLR, 2015.
- Theoretically principled trade-off between robustness and accuracy. In ICML, 2019.
- Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. doi: 10.1198/016214505000000907. URL https://doi.org/10.1198/016214505000000907.
- Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Nicolas Le Roux. Tighter bounds lead to improved classifiers. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=HyAbMKwxe.
- Improving adversarial robustness requires revisiting misclassified examples. ICLR, 2020.
- Robustbench: a standardized adversarial robustness benchmark. arXiv preprint arXiv:2010.09670, 2020a.
- Robustness may be at odds with accuracy. In ICLR, 2019a.
- Provable tradeoffs in adversarially robust classification. arXiv preprint arXiv:2006.05161, 2020.
- Precise tradeoffs in adversarial training for linear regression. In Conference on Learning Theory, pages 2034–2078. PMLR, 2020.
- Adversarially robust generalization requires more data. Advances in neural information processing systems, 31, 2018.
- More data can expand the generalization gap between adversarially robust and standard models. In International Conference on Machine Learning, pages 1670–1680. PMLR, 2020.
- Disentangling adversarial robustness and generalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6976–6987, 2019.
- Overfitting in adversarially robust deep learning. In ICML, 2020.
- On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020.
- Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.
- Jacobian adversarially regularized networks for robustness. ICLR, 2020.
- Robust learning with jacobian regularization. arXiv preprint arXiv:1908.02729, 2019.
- Lipschitz regularized deep neural networks generalize and are adversarially robust. arXiv preprint arXiv:1808.09540, 2018.
- Adversarial weight perturbation helps robust generalization. NeurIPS, 2020.
- Exploring the vulnerability of deep neural networks: A study of parameter corruption. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11648–11656, 2021.
- Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
- Finding actual descent directions for adversarial training. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=I3HCE7Ro78H.
- Jonathan F Bard. Practical bilevel optimization: algorithms and applications, volume 30. Springer Science & Business Media, 2013.
- Logit pairing methods can fool gradient-based attacks. arXiv preprint arXiv:1810.12042, 2018.
- Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks. International Journal of Computer Vision, 128:1028–1046, 2020b.
- Robustness may be at odds with accuracy. In ICLR, 2019b.
- Probabilistically robust learning: Balancing average and worst-case performance. In International Conference on Machine Learning, pages 18667–18686. PMLR, 2022.
- An alternative surrogate loss for pgd-based adversarial testing. arXiv preprint arXiv:1910.09338, 2019.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Cifar datasets (canadian institute for advanced research). 2009. URL http://www.cs.toronto.edu/~kriz/cifar.html.
- Deep residual learning for image recognition. In CVPR, 2016.
- Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638, 2023.
- Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946, 2021.
- Improving robustness using generated data. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=0NXUSlb6oEu.
- Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective. arXiv e-prints, art. arXiv:2212.11005, December 2022. doi: 10.48550/arXiv.2212.11005.
- Robustness and accuracy could be reconcilable by (Proper) definition. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 17258–17277. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/pang22a.html.
- Robust overfitting may be mitigated by properly learned smoothening. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=qZzy5urZw9.
- Understanding robust overfitting of adversarial training and beyond. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 25595–25610. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/yu22b.html.
- Exploring memorization in adversarial training. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=7gE9V9GBZaI.
- Adversarial vertex mixup: Toward better adversarially robust generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Revisiting and advancing fast adversarial training through the lens of bi-level optimization. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 26693–26712. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/zhang22ak.html.
- Fast is better than free: Revisiting adversarial training. ICLR, 2020.