AFD: Mitigating Feature Gap for Adversarial Robustness by Feature Disentanglement (2401.14707v2)
Abstract: Adversarial fine-tuning methods enhance adversarial robustness via fine-tuning the pre-trained model in an adversarial training manner. However, we identify that some specific latent features of adversarial samples are confused by adversarial perturbation and lead to an unexpectedly increasing gap between features in the last hidden layer of natural and adversarial samples. To address this issue, we propose a disentanglement-based approach to explicitly model and further remove the specific latent features. We introduce a feature disentangler to separate out the specific latent features from the features of the adversarial samples, thereby boosting robustness by eliminating the specific latent features. Besides, we align clean features in the pre-trained model with features of adversarial samples in the fine-tuned model, to benefit from the intrinsic features of natural samples. Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
- Robustness against gradient based attacks through cost effective network fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 28–37, June 2023.
- Feature purification: How adversarial training performs robust deep learning. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 977–988. IEEE, 2022.
- Understanding and improving fast adversarial training. ArXiv, abs/2007.02617, 2020. URL https://api.semanticscholar.org/CorpusID:220363591.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning, 2020.
- Adversarial robustness against multiple and single l_p𝑙_𝑝l\_pitalic_l _ italic_p-threat models via quick fine-tuning of robust classifiers. In International Conference on Machine Learning, pp. 4436–4454. PMLR, 2022.
- Imagenet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009.
- Exploring memorization in adversarial training. arXiv preprint arXiv:2106.01606, 2021.
- Adversarially robust distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 3996–4003, 2020.
- Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. doi: 10.1109/CVPR.2016.90.
- Robust adversarial attacks detection based on explainable deep reinforcement learning for uav guidance and planning. IEEE Transactions on Intelligent Vehicles, 2023.
- Self-adaptive training: beyond empirical risk minimization. Advances in neural information processing systems, 33:19365–19376, 2020.
- Fast adversarial training with adaptive step size. IEEE Transactions on Image Processing, 2023.
- A simple fine-tuning is all you need: Towards robust deep learning via adversarial fine-tuning. arXiv preprint arXiv:2012.13628, 2020.
- Las-at: adversarial training with learnable attack strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13398–13408, 2022.
- Enhancing adversarial training with second-order statistics of weights. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15273–15283, 2022.
- Understanding catastrophic overfitting in single-step adversarial training. In AAAI Conference on Artificial Intelligence, 2020. URL https://api.semanticscholar.org/CorpusID:222133879.
- Krizhevsky, A. Learning multiple layers of features from tiny images. 2009.
- Robust evaluation of diffusion-based adversarial purification. arXiv preprint arXiv:2303.09051, 2023.
- Twins: A fine-tuning framework for improved transferability of adversarial robustness and generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Learning to generate noise for multi-attack robustness. In International Conference on Machine Learning, pp. 7279–7289. PMLR, 2021.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Adversarial robustness against the union of multiple perturbation models. In International Conference on Machine Learning, pp. 6640–6650. PMLR, 2020.
- Robustness via curvature regularization, and vice versa. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9070–9078, 2018. URL https://api.semanticscholar.org/CorpusID:53737378.
- Diffusion models for adversarial purification. arXiv preprint arXiv:2205.07460, 2022.
- Towards robust detection of adversarial examples. Advances in neural information processing systems, 31, 2018.
- Adversarial training for free! In Neural Information Processing Systems, 2019. URL https://api.semanticscholar.org/CorpusID:139102395.
- Better robustness by more coverage: Adversarial training with mixup augmentation for robust fine-tuning. arXiv preprint arXiv:2012.15699, 2020.
- A critical revisit of adversarial robustness in 3d point cloud recognition with diffusion-driven purification. In International Conference on Machine Learning, pp. 33100–33114. PMLR, 2023.
- Adversarial finetuning with latent representation constraint to mitigate accuracy-robustness tradeoff. arXiv preprint arXiv:2308.16454, 2023.
- Adversarial training and robustness for multiple perturbations. Advances in neural information processing systems, 32, 2019.
- Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations, 2020.
- Fast is better than free: Revisiting adversarial training. ArXiv, abs/2001.03994, 2020. URL https://api.semanticscholar.org/CorpusID:210164926.
- Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.
- Adversarial purification with score-based generative models. In International Conference on Machine Learning, pp. 12062–12072. PMLR, 2021.
- Wide residual networks. ArXiv, abs/1605.07146, 2016.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pp. 7472–7482. PMLR, 2019.
- Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. Advances in Neural Information Processing Systems, 31, 2018.
- Modeling adversarial noise for adversarial training. In International Conference on Machine Learning, pp. 27353–27366. PMLR, 2022.
- Improving generalization of adversarial training via robust critical fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4424–4434, October 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.