Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM (2403.11448v1)
Abstract: Numerous studies have demonstrated the susceptibility of deep neural networks (DNNs) to subtle adversarial perturbations, prompting the development of many advanced adversarial defense methods aimed at mitigating adversarial attacks. Current defense strategies usually train DNNs for a specific adversarial attack method and can achieve good robustness in defense against this type of adversarial attack. Nevertheless, when subjected to evaluations involving unfamiliar attack modalities, empirical evidence reveals a pronounced deterioration in the robustness of DNNs. Meanwhile, there is a trade-off between the classification accuracy of clean examples and adversarial examples. Most defense methods often sacrifice the accuracy of clean examples in order to improve the adversarial robustness of DNNs. To alleviate these problems and enhance the overall robust generalization of DNNs, we propose the Test-Time Pixel-Level Adversarial Purification (TPAP) method. This approach is based on the robust overfitting characteristic of DNNs to the fast gradient sign method (FGSM) on training and test datasets. It utilizes FGSM for adversarial purification, to process images for purifying unknown adversarial perturbations from pixels at testing time in a "counter changes with changelessness" manner, thereby enhancing the defense capability of DNNs against various unknown adversarial attacks. Extensive experimental results show that our method can effectively improve both overall robust generalization of DNNs, notably over previous methods.
- Defense against universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3389–3398, 2018.
- Understanding and improving fast adversarial training. Advances in Neural Information Processing Systems, 33:16048–16059, 2020.
- Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pages 2206–2216. PMLR, 2020.
- Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
- Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
- Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1625–1634, 2018.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Stochastic security: Adversarial defense using long-run dynamics of energy-based models. arXiv preprint arXiv:2005.13525, 2020.
- Fast adversarial training with adaptive step size. IEEE Transactions on Image Processing, 2023.
- Puvae: A variational autoencoder to purify adversarial examples. IEEE Access, 7:126582–126593, 2019.
- Improving fast adversarial training with prior-guided knowledge. arXiv preprint arXiv:2304.00202, 2023.
- Understanding catastrophic overfitting in single-step adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8119–8127, 2021.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Learning multiple layers of features from tiny images. 2009.
- Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Towards understanding fast adversarial training. arXiv preprint arXiv:2006.03089, 2020.
- Subspace adversarial training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13409–13418, 2022.
- Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1778–1787, 2018.
- Dual manifold adversarial robustness: Defense against lp and non-lp adversarial attacks. Advances in Neural Information Processing Systems, 33:3487–3498, 2020.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2574–2582, 2016.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460, 2022.
- A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
- Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4322–4330, 2019.
- Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Online adversarial purification based on self-supervision. arXiv preprint arXiv:2101.09387, 2021.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Guided adversarial attack for evaluating and enhancing adversarial defenses. Advances in Neural Information Processing Systems, 33:20297–20308, 2020.
- Towards efficient and effective adversarial training. Advances in Neural Information Processing Systems, 34:11821–11833, 2021.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Regularizer to mitigate gradient masking effect during single-step adversarial training. In CVPR Workshops, pages 66–73, 2019.
- Guided diffusion model for adversarial purification. arXiv preprint arXiv:2205.14969, 2022.
- Improving adversarial robustness requires revisiting misclassified examples. In International conference on learning representations, 2019.
- Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638, 2023.
- Cfa: Class-wise calibrated fair adversarial training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8193–8201, 2023.
- Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994, 2020.
- Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.
- Stronger and faster wasserstein adversarial attacks. In International Conference on Machine Learning, pages 10377–10387. PMLR, 2020.
- Spatially transformed adversarial examples. arXiv preprint arXiv:1801.02612, 2018.
- Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2730–2739, 2019.
- Exploring and exploiting decision boundary dynamics for adversarial robustness. arXiv preprint arXiv:2302.03015, 2023.
- Class-disentanglement and applications in adversarial detection and defense. Advances in Neural Information Processing Systems, 34:16051–16063, 2021.
- Me-net: Towards effective adversarial robustness with matrix estimation. arXiv preprint arXiv:1905.11971, 2019.
- Ensemble generative cleaning with feedback loops for defending adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 581–590, 2020.
- Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
- Attacks which do not kill training make adversarial learning stronger. In International conference on machine learning, pages 11278–11287. PMLR, 2020.
- Towards defending against adversarial examples via attack-invariant features. In International Conference on Machine Learning, pages 12835–12845. PMLR, 2021.
- Removing adversarial noise in class activation feature space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7878–7887, 2021.