Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training (2106.05453v2)
Abstract: Deep neural networks (DNNs) are vulnerable to adversarial noise. A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise, among which the input pre-processing methods are scalable and show great potential to safeguard DNNs. However, pre-processing methods may suffer from the robustness degradation effect, in which the defense reduces rather than improving the adversarial robustness of a target model in a white-box setting. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. To solve this problem, we investigate the influence of full adversarial examples which are crafted against the full model, and find they indeed have a positive impact on the robustness of defenses. Furthermore, we find that simply changing the adversarial training examples in pre-processing methods does not completely alleviate the robustness degradation effect. This is due to the adversarial risk of the pre-processed model being neglected, which is another cause of the robustness degradation effect. Motivated by above analyses, we propose a method called Joint Adversarial Training based Pre-processing (JATP) defense. Specifically, we formulate a feature similarity based adversarial risk for the pre-processing model by using full adversarial examples found in a feature space. Unlike standard adversarial training, we only update the pre-processing model, which prompts us to introduce a pixel-wise loss to improve its cross-model transferability. We then conduct a joint adversarial training on the pre-processing model to minimize this overall risk. Empirical results show that our method could effectively mitigate the robustness degradation effect across different target models in comparison to previous state-of-the-art approaches.
- Scalable training of l 1-regularized log-linear models. In Proceedings of the 24th international conference on Machine learning, pages 33–40, 2007.
- Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
- On the robustness of the cvpr 2018 white-box adversarial example defenses. arXiv preprint arXiv:1804.03286, 2018.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, 2018.
- Improving vision transformers by revisiting high-frequency components. In European Conference on Computer Vision, pages 1–18. Springer, 2022.
- Magnet and" efficient defenses against adversarial attacks" are not robust to adversarial examples. arXiv preprint arXiv:1711.08478, 2017a.
- Towards evaluating the robustness of neural networks. In 2017 Ieee Symposium on Security and Privacy (sp), pages 39–57. IEEE, 2017b.
- Robust overfitting may be mitigated by properly learned smoothening. In International Conference on Learning Representations, volume 1, 2021.
- On breaking deep generative model-based defenses and beyond. In International Conference on Machine Learning, pages 1736–1745. PMLR, 2020.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning, pages 2206–2216. PMLR, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- On the sensitivity of adversarial robustness to input data distributions. In ICLR (Poster), 2019.
- Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Adversarial camouflage: Hiding physical-world attacks with natural styles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1000–1008, 2020.
- Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1625–1634, 2018.
- Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410, 2017.
- Resisting adversarial attacks using gaussian mixture variational autoencoders. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 541–548, 2019.
- Adversarial attacks on variational autoencoders. arXiv preprint arXiv:1806.04646, 2018.
- Multi-feature canonical correlation analysis for face photo-sketch image retrieval. In Proceedings of the 21st ACM international conference on Multimedia, pages 617–620, 2013.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- Countering adversarial images using input transformations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Black-box adversarial attack with transferable model-based embedding. arXiv preprint arXiv:1911.07140, 2019.
- APE-GAN: adversarial perturbation elimination with GAN. In International Conference on Acoustics, Speech and Signal Processing, pages 3842–3846, 2019.
- Mask r-cnn. IEEE Transactions on Pattern Analysis & Machine Intelligence, PP:1–1, 2017.
- Adversarial examples for generative models. In 2018 ieee security and privacy workshops (spw), pages 36–42. IEEE, 2018.
- Learning multiple layers of features from tiny images. 2009.
- Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- Common feature discriminant analysis for matching infrared face images to optical face images. IEEE transactions on image processing, 23(6):2436–2445, 2014.
- Mutual component analysis for heterogeneous face recognition. ACM Transactions on Intelligent Systems and Technology (TIST), 7(3):1–23, 2016.
- Defense against adversarial attacks using high-level representation guided denoiser. In Conference on Computer Vision and Pattern Recognition, pages 1778–1787, 2018.
- Invert and defend: Model-based approximate inversion of generative adversarial networks for secure inference. arXiv preprint arXiv:1911.10291, 2019.
- Spatio-temporal embedding for statistical face recognition from video. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part II 9, pages 374–388. Springer, 2006.
- Characterizing adversarial subspaces using local intrinsic dimensionality. In International Conference on Learning Representations, 2018.
- Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Representations, 2018.
- A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 262–271, 2020.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Detecting and diagnosing adversarial images with class-conditional capsule reconstructions. arXiv preprint arXiv:1907.02957, 2019.
- Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
- Towards the first adversarially robust neural network model on mnist. arXiv preprint arXiv:1805.09190, 2018.
- Type i attack for generative models. In 2020 IEEE International Conference on Image Processing (ICIP), pages 593–597. IEEE, 2020.
- Sequence to sequence learning with neural networks. In Neural Information Processing Systems, pages 3104–3112, 2014.
- Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
- Video based face recognition using multiple classifiers. In Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings., pages 345–349. IEEE, 2004.
- On adaptive attacks to adversarial example defenses. arXiv preprint arXiv:2002.08347, 2020.
- Triangle attack: A query-efficient decision-based adversarial attack. In European conference on computer vision, pages 156–174. Springer, 2022.
- Residual convolutional ctc networks for automatic speech recognition. arXiv preprint arXiv:1702.07793, 2017.
- On the convergence and robustness of adversarial training. In ICML, volume 1, page 2, 2019a.
- Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations, 2019b.
- Skip connections matter: On the transferability of adversarial examples generated with resnets. arXiv preprint arXiv:2002.05990, 2020a.
- Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33, 2020b.
- Stronger and faster wasserstein adversarial attacks. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 10377–10387, 2020c.
- Defending against physically realizable attacks on image classification. In 8th International Conference on Learning Representations, 2020d.
- Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2730–2739, 2019.
- Adversarial t-shirt! evading person detectors in a physical world. In European Conference on Computer Vision, pages 665–681. Springer, 2020.
- Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
- Defense against adversarial attacks using feature scattering-based adversarial training. arXiv preprint arXiv:1907.10764, 2019.
- Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning, pages 7472–7482. PMLR, 2019.