Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization (2404.08154v2)

Published 11 Apr 2024 in cs.LG

Abstract: Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier already displayed a slight distortion, indicated by the presence of few AAEs. Furthermore, the classifier directly optimizing these AAEs will accelerate its distortion, and correspondingly, the variation of AAEs will sharply increase as a result. In such a vicious circle, the classifier rapidly becomes highly distorted and manifests as CO within a few iterations. These observations motivate us to eliminate CO by hindering the generation of AAEs. Specifically, we design a novel method, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted. Extensive experiments demonstrate that our method can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Understanding and improving fast adversarial training. Advances in Neural Information Processing Systems, 33:16048–16059, 2020.
  2. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pages 274–283. PMLR, 2018.
  3. Stephen Balaban. Deep learning and face recognition: the state of the art. Biometric and surveillance technology for human and activity identification XII, 9457:68–75, 2015.
  4. Evaluating the adversarial robustness of adaptive test-time defenses. In International Conference on Machine Learning, pages 4421–4435. PMLR, 2022.
  5. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020.
  6. Make some noise: Reliable and efficient single-step adversarial training. arXiv preprint arXiv:2202.01181, 2022.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  8. Machine learning for medical imaging. Radiographics, 37(2):505–515, 2017.
  9. Automatic searching and pruning of deep neural networks for medical imaging diagnostic. IEEE Transactions on Neural Networks and Learning Systems, 32(12):5664–5674, 2020.
  10. Zerograd: Mitigating and explaining catastrophic overfitting in fgsm adversarial training. arXiv preprint arXiv:2103.15476, 2021.
  11. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  12. A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 37(3):362–386, 2020.
  13. Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117, 2017.
  14. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016.
  15. Fast adversarial training with adaptive step size. arXiv preprint arXiv:2206.02417, 2022.
  16. Robust generalization against photon-limited corruptions via worst-case sharpness minimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16175–16185, 2023.
  17. Prior-guided adversarial initialization for fast adversarial training. In European Conference on Computer Vision, pages 567–584. Springer, 2022.
  18. Boosting fast adversarial training with learnable adversarial initialization. IEEE Transactions on Image Processing, 31:4417–4430, 2022.
  19. Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.
  20. Reluplex: An efficient smt solver for verifying deep neural networks. In International conference on computer aided verification, pages 97–117. Springer, 2017.
  21. Understanding catastrophic overfitting in single-step adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8119–8127, 2021.
  22. Learning multiple layers of features from tiny images. 2009.
  23. Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018.
  24. Subspace adversarial training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13409–13418, 2022.
  25. Todd Litman. Autonomous vehicle implementation predictions. Victoria Transport Policy Institute Victoria, BC, Canada, 2017.
  26. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  27. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267, 2017.
  28. Reading digits in natural images with unsupervised feature learning. 2011.
  29. Reliably fast adversarial training via latent adversarial perturbation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7758–7767, 2021.
  30. Adversarial training for free! Advances in Neural Information Processing Systems, 32, 2019.
  31. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 2016 acm sigsac conference on computer and communications security, pages 1528–1540, 2016.
  32. Leslie N Smith. Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV), pages 464–472. IEEE, 2017.
  33. Guided adversarial attack for evaluating and enhancing adversarial defenses. Advances in Neural Information Processing Systems, 33:20297–20308, 2020.
  34. Towards efficient and effective adversarial training. Advances in Neural Information Processing Systems, 34:11821–11833, 2021.
  35. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  36. BS Vivek and R Venkatesh Babu. Single-step adversarial training with dropout scheduling. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 947–956. IEEE, 2020.
  37. Plug-and-pipeline: Efficient regularization for single-step adversarial training. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 138–146. IEEE, 2020.
  38. Multi-stage optimization based adversarial training. arXiv preprint arXiv:2106.15357, 2021.
  39. Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994, 2020.
  40. Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.
  41. Robust weight perturbation for adversarial training. arXiv preprint arXiv:2205.14826, 2022.
  42. Understanding robust overfitting of adversarial training and beyond. In International Conference on Machine Learning, pages 25595–25610. PMLR, 2022.
  43. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
  44. Rethinking lipschitz neural networks and certified robustness: A boolean function perspective. Advances in Neural Information Processing Systems, 35:19398–19413, 2022.
  45. Noise augmentation is all you need for fgsm fast adversarial training: Catastrophic overfitting and robust overfitting require different augmentation. arXiv preprint arXiv:2202.05488, 2022.
  46. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
  47. Improving adversarial robustness via mutual information estimation. In International Conference on Machine Learning, pages 27338–27352. PMLR, 2022.
  48. Phase-aware adversarial defense for improving adversarial robustness. 2023.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets