Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement (2403.09101v1)
Abstract: Adversarial training (AT) is currently one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks. However, most AT methods suffer from robust overfitting, i.e., a significant generalization gap in adversarial robustness between the training and testing curves. In this paper, we first identify a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm. As such label noise is mainly caused by a distribution mismatch and improper label assignments, we are motivated to propose a label refinement approach for AT. Specifically, our Self-Guided Label Refinement first self-refines a more accurate and informative label distribution from over-confident hard labels, and then it calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers. Empirical results demonstrate that our method can simultaneously boost the standard accuracy and robust performance across multiple benchmark datasets, attack types, and architectures. In addition, we also provide a set of analyses from the perspectives of information theory to dive into our method and suggest the importance of soft labels for robust generalization.
- Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res., 19:50:1–50:34, 2018.
- Square attack: A query-efficient black-box adversarial attack via random search. In ECCV, pages 484–501. Springer, 2020.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, pages 274–283. PMLR, 2018.
- On evaluating adversarial robustness. CoRR, abs/1902.06705, 2019.
- Robust overfitting may be mitigated by properly learned smoothening. In ICLR. OpenReview.net, 2021.
- Mitigating memorization of noisy labels via regularization between representations. In ICLR. OpenReview.net, 2023.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, pages 2206–2216. PMLR, 2020a.
- Minimally distorted adversarial examples with a fast adaptive boundary attack. In ICML, pages 2196–2205. PMLR, 2020b.
- Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017.
- Label noise in adversarial training: A novel perspective to study robust overfitting. In NeurIPS, 2022.
- Phases of learning dynamics in artificial neural networks in the absence or presence of mislabeled data. Mach. Learn. Sci. Technol., 2(4):43001, 2021.
- Robust loss functions under label noise for deep neural networks. In AAAI, February 4-9, San Francisco, California, USA, pages 1919–1925. AAAI Press, 2017.
- Adversarially robust distillation. In AAAI, pages 3996–4003. AAAI Press, 2020.
- Explaining and harnessing adversarial examples. In ICLR, 2015.
- Deep residual learning for image recognition. In CVPR, pages 770–778. IEEE Computer Society, 2016.
- Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.
- Does label smoothing mitigate label noise? In ICML, pages 6448–6458. PMLR, 2020.
- Towards deep learning models resistant to adversarial attacks. In ICLR. OpenReview.net, 2018.
- Obtaining well calibrated probabilities using bayesian binning. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA, pages 2901–2907. AAAI Press, 2015.
- Bag of tricks for adversarial training. In ICLR. OpenReview.net, 2021.
- Regularizing neural networks by penalizing confident output distributions. In ICLR. OpenReview.net, 2017.
- Overfitting in adversarially robust deep learning. In ICML, pages 8093–8104. PMLR, 2020.
- How benign is benign overfitting ? In ICLR. OpenReview.net, 2021.
- Towards efficient and effective adversarial training. In NeurIPS, pages 11821–11833, 2021.
- Intriguing properties of neural networks. In ICLR, 2014.
- Rethinking the inception architecture for computer vision. In CVPR, pages 2818–2826. IEEE Computer Society, 2016.
- Probabilistic margins for instance reweighting in adversarial training. In NeurIPS, pages 23258–23269, 2021.
- Symmetric cross entropy for robust learning with noisy labels. In ICCV, pages 322–330. IEEE, 2019.
- Improving adversarial robustness requires revisiting misclassified examples. In ICLR. OpenReview.net, 2020.
- Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective. In NeurIPS, 2023.
- On the generalization of models trained with SGD: information-theoretic bounds and implications. In ICLR. OpenReview.net, 2022.
- Pac-bayes information bottleneck. In ICLR. OpenReview.net, 2022.
- Adversarial weight perturbation helps robust generalization. In NeurIPS, 2020.
- Robust early-learning: Hindering the memorization of noisy labels. In ICLR. OpenReview.net, 2021.
- Rethinking bias-variance trade-off for generalization of neural networks. In ICML, pages 10767–10777. PMLR, 2020.
- Robust weight perturbation for adversarial training. In IJCAI, pages 3688–3694. ijcai.org, 2022a.
- Understanding robust overfitting of adversarial training and beyond. In ICML, pages 25595–25610. PMLR, 2022b.
- mixup: Beyond empirical risk minimization. In ICLR. OpenReview.net, 2018.
- Theoretically principled trade-off between robustness and accuracy. In ICML, pages 7472–7482. PMLR, 2019.
- Geometry-aware instance-reweighted adversarial training. In ICLR. OpenReview.net, 2021.
- Revisiting adversarial robustness distillation: Robust soft labels make student better. In ICCV, pages 16423–16432. IEEE, 2021.
- Daiwei Yu (4 papers)
- Zhuorong Li (2 papers)
- Lina Wei (9 papers)
- Canghong Jin (2 papers)
- Yun Zhang (103 papers)
- Sixian Chan (6 papers)