Boosting Adversarial Training via Fisher-Rao Norm-based Regularization (2403.17520v1)
Abstract: Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First, We leverage the Fisher-Rao norm, a geometrically invariant metric for model complexity, to establish the non-trivial bounds of the Cross-Entropy Loss-based Rademacher complexity for a ReLU-activated Multi-Layer Perceptron. Then we generalize a complexity-related variable, which is sensitive to the changes in model width and the trade-off factors in adversarial training. Moreover, intensive empirical evidence validates that this variable highly correlates with the generalization gap of Cross-Entropy loss between adversarial-trained and standard-trained models, especially during the initial and final phases of the training process. Building upon this observation, we propose a novel regularization framework, called Logit-Oriented Adversarial Training (LOAT), which can mitigate the trade-off between robustness and accuracy while imposing only a negligible increase in computational overhead. Our extensive experiments demonstrate that the proposed regularization strategy can boost the performance of the prevalent adversarial training algorithms, including PGD-AT, TRADES, TRADES (LSE), MART, and DM-AT, across various network architectures. Our code will be available at https://github.com/TrustAI/LOAT.
- Spectrally-normalized margin bounds for neural networks, 2017.
- Vapnik-chervonenkis dimension of neural nets. The handbook of brain theory and neural networks, pages 1188–1192, 2003.
- Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- End to end learning for self-driving cars, 2016.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Adversarial examples for malware detection. In Computer Security–ESORICS 2017: 22nd European Symposium on Research in Computer Security, Oslo, Norway, September 11-15, 2017, Proceedings, Part II 22, pages 62–79. Springer, 2017.
- Model complexity of deep learning: A survey. Knowledge and Information Systems, 63:2585–2619, 2021.
- A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37:100270, 2020.
- Deep reinforcement learning. In Machine Learning Safety, pages 219–235. Springer, 2023a.
- A survey of safety and trustworthiness of large language models through the lens of verification and validation. arXiv preprint arXiv:2305.11391, 2023b.
- Enhancing adversarial training with second-order statistics of weights, 2022.
- Adversarial logit pairing, 2018.
- A simple weight decay can improve generalization. Advances in neural information processing systems, 4, 1991.
- Fisher-rao metric, geometry, and complexity of neural networks. In The 22nd international conference on artificial intelligence and statistics, pages 888–896. PMLR, 2019.
- A unified gradient regularization family for adversarial examples, 2015.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Robustness via curvature regularization, and vice versa, 2018.
- Sparse adversarial video attacks with spatial transformations. In The 32nd British Machine Vision Conference (BMVC’21), 2021.
- Certified policy smoothing for cooperative multi-agent reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’23), 2023.
- Preetum Nakkiran. Adversarial robustness may be at odds with simplicity, 2019.
- Path-sgd: Path-normalized optimization in deep neural networks, 2015a.
- Norm-based capacity control in neural networks, 2015b.
- Robustness and accuracy could be reconcilable by (proper) definition. In International Conference on Machine Learning, 2022.
- Certified defenses against adversarial examples. arXiv preprint arXiv:1801.09344, 2018.
- Overfitting in adversarially robust deep learning, 2020.
- Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients, 2017.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Robustness may be at odds with accuracy, 2019.
- Deep learning and its adversarial robustness: A brief introduction. In HANDBOOK ON COMPUTER LEARNING AND INTELLIGENCE: Volume 2: Deep Learning, Intelligent Control and Evolutionary Computation, pages 547–584. 2022.
- Self-adaptive adversarial training for robust medical segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’23), pages 725–735. Springer, 2023a.
- Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations, 2020.
- Understanding adversarial robustness of vision transformers via cauchy problem. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’22), 2022.
- Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638, 2023b.
- Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.
- Adversarial driving: Attacking end-to-end autonomous driving. In 2023 IEEE Intelligent Vehicles Symposium (IV), pages 1–7. IEEE, 2023.
- A closer look at accuracy vs. robustness, 2020.
- Dimba: discretely masked black-box attack in single object tracking. Machine Learning, pages 1–19, 2022.
- Rerogcrl: Representation-based robustness in goal-conditioned reinforcement learning. arXiv preprint arXiv:2312.07392, 2023.
- Understanding generalization in adversarial training via the bias-variance decomposition, 2021.
- Reachability analysis of neural network control systems. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’23), 2023.
- Theoretically principled trade-off between robustness and accuracy. ArXiv, abs/1901.08573, 2019a.
- Theoretically principled trade-off between robustness and accuracy, 2019b.