Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization (2401.16352v4)
Abstract: The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robustness. Meanwhile, both methods share one common limitation on the degraded standard accuracy. To mitigate these issues, we propose a novel pipeline to acquire the robust purifier model, named Adversarial Training on Purification (AToP), which comprises two components: perturbation destruction by random transforms (RT) and purifier model fine-tuned (FT) by adversarial loss. RT is essential to avoid overlearning to known attacks, resulting in the robustness generalization to unseen attacks, and FT is essential for the improvement of robustness. To evaluate our method in an efficient and scalable way, we conduct extensive experiments on CIFAR-10, CIFAR-100, and ImageNette to demonstrate that our method achieves optimal robustness and exhibits generalization ability against unseen attacks.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pp. 274–283. PMLR, 2018a.
- Synthesizing robust adversarial examples. In International conference on machine learning, pp. 284–293. PMLR, 2018b.
- Adversarial robustness on in-and out-distribution improves explainability. In European Conference on Computer Vision, pp. 228–245. Springer, 2020.
- Improving the accuracy-robustness trade-off of classifiers via adaptive smoothing. arXiv preprint arXiv:2301.12554, 2023.
- Ddsa: A defense against adversarial attacks using deep denoising sparse autoencoder. IEEE Access, 7:160397–160407, 2019.
- Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pp. 39–57. Ieee, 2017.
- Unlabeled data improves adversarial robustness. Advances in neural information processing systems, 32, 2019.
- Adversarial attack on attackers: Post-process to mitigate black-box score-based query attacks. Conference on neural information processing systems, 2022.
- Certified adversarial robustness via randomized smoothing. In international conference on machine learning, pp. 1310–1320. PMLR, 2019.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pp. 2206–2216. PMLR, 2020.
- Robustbench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Decoupled kullback-leibler divergence loss. arXiv preprint arXiv:2305.13948, 2023.
- Deep image prior based defense against adversarial examples. Pattern Recognition, 122:108249, 2022.
- An analysis of adversarial attacks and defenses on autonomous driving models. In 2020 IEEE international conference on pervasive computing and communications (PerCom), pp. 1–10. IEEE, 2020.
- Mma training: Direct input space margin maximization through adversarial training. In International Conference on Learning Representations, 2019.
- l-inf robustness and beyond: Unleashing efficient adversarial training. In European Conference on Computer Vision, pp. 467–483. Springer, 2022.
- Explaining and harnessing adversarial examples. International Conference on Learning Representations, 2015.
- Uncovering the limits of adversarial training against norm-bounded adversarial examples. arXiv preprint arXiv:2010.03593, 2020.
- Improving robustness using generated data. Advances in Neural Information Processing Systems, 34:4218–4233, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Using pre-training can improve model robustness and uncertainty. In International conference on machine learning, pp. 2712–2721. PMLR, 2019.
- Jeremy Howard. Fastai/imagenette. https://github.com/fastai/imagenette, 2021.
- Puvae: A variational autoencoder to purify adversarial examples. IEEE Access, 7:126582–126593, 2019.
- Learning multiple layers of features from tiny images. Technical Report, 2009.
- Perceptual adversarial robustness: Defense against unseen threat models. In International Conference on Learning Representations (ICLR), 2021.
- Robust evaluation of diffusion-based adversarial purification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 134–144, October 2023.
- Certified adversarial robustness with additive noise. Advances in neural information processing systems, 32, 2019.
- Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2018a.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018b.
- Diffusion models for adversarial purification. International Conference on Machine Learning, 2022.
- Robustness and accuracy could be reconcilable by (proper) definition. In International Conference on Machine Learning, pp. 17258–17277. PMLR, 2022.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Robustness and generalization via generative adversarial training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15711–15720, 2021.
- Barrage of random transforms for adversarially robust defense. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6528–6537, 2019.
- Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946, 2021.
- A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples. Applied Intelligence, pp. 1–14, 2022.
- Online adversarial purification based on self-supervision. International Conference on Learning Representations, 2021.
- Robustifying models against adversarial attacks by langevin dynamics. Neural Networks, 137:1–17, 2021.
- Confidence-calibrated adversarial training: Generalizing to unseen attacks. In International Conference on Machine Learning, pp. 9155–9166. PMLR, 2020.
- Intriguing properties of neural networks. International Conference on Learning Representations, 2014.
- Consistency regularization for adversarial robustness. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8414–8422, 2022.
- Ensemble adversarial training: Attacks and defenses. International Conference on Learning Representations, 2018.
- On adaptive attacks to adversarial example defenses. Advances in neural information processing systems, 33:1633–1645, 2020.
- Trust-no-pixel: A remarkably simple defense against adversarial attacks based on massive inpainting. In 2022 International Joint Conference on Neural Networks (IJCNN), pp. 1–10. IEEE, 2022.
- Better diffusion models further improve adversarial training. In International Conference on Machine Learning. PMLR, 2023.
- Denoising masked autoencoders help robust classification. In The Eleventh International Conference on Learning Representations, 2023.
- Spatially transformed adversarial examples. In International Conference on Learning Representations, 2018.
- Me-net: Towards effective adversarial robustness with matrix estimation. International Conference on Machine Learning, 2019.
- Wide residual networks. In Procedings of the British Machine Vision Conference 2016. British Machine Vision Association, 2016.
- Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pp. 7472–7482. PMLR, 2019.
- Geometry-aware instance-reweighted adversarial training. In International Conference on Learning Representations, 2020.