Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness (2405.16036v1)
Abstract: Randomized smoothing has become a leading method for achieving certified robustness in deep classifiers against l_{p}-norm adversarial perturbations. Current approaches for achieving certified robustness, such as data augmentation with Gaussian noise and adversarial training, require expensive training procedures that tune large models for different Gaussian noise levels and thus cannot leverage high-performance pre-trained neural networks. In this work, we introduce a novel certifying adapters framework (CAF) that enables and enhances the certification of classifier adversarial robustness. Our approach makes few assumptions about the underlying training algorithm or feature extractor and is thus broadly applicable to different feature extractor architectures (e.g., convolutional neural networks or vision transformers) and smoothing algorithms. We show that CAF (a) enables certification in uncertified models pre-trained on clean datasets and (b) substantially improves the performance of certified classifiers via randomized smoothing and SmoothAdv at multiple radii in CIFAR-10 and ImageNet. We demonstrate that CAF achieves improved certified accuracies when compared to methods based on random or denoised smoothing, and that CAF is insensitive to certifying adapter hyperparameters. Finally, we show that an ensemble of adapters enables a single pre-trained feature extractor to defend against a range of noise perturbation scales.
- Z. Allen-Zhu and Y. Li, “Feature purification: How adversarial training performs robust deep learning,” in 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2022, pp. 977–988.
- N. Carlini, F. Tramer, K. D. Dvijotham, L. Rice, M. Sun, and J. Z. Kolter, “(certified!!) adversarial robustness for free!” 2023.
- N. Carlini, F. Tramèr, K. Dvijotham, L. Rice, M. Sun, and Z. Kolter, “(certified!!) adversarial robustness for free!” International Conference on Learning Representations (ICLR), 2023.
- J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness via randomized smoothing,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 1310–1320. [Online]. Available: https://proceedings.mlr.press/v97/cohen19c.html
- A. Creswell, A. Pouplin, and A. A. Bharath, “Denoising adversarial autoencoders: classifying skin lesions using limited labelled training data,” IET Computer Vision, vol. 12, no. 8, pp. 1105–1111, 2018.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” ICLR, 2021.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15 979–15 988.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:206594692
- M. Z. Horváth, M. N. Mueller, M. Fischer, and M. Vechev, “Boosting randomized smoothing with variance reduced classifiers,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=mHu2vIds_-b
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=nZeVKeeFYf9
- U. Hwang, J. Park, H. Jang, S. Yoon, and N. I. Cho, “Puvae: A variational autoencoder to purify adversarial examples,” IEEE Access, vol. 7, pp. 126 582–126 593, 2019.
- J. Jeong, S. Park, M. Kim, H.-C. Lee, D.-G. Kim, and J. Shin, “Smoothmix: Training confidence-calibrated smoothed classifiers for certified robustness,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 30 153–30 168. [Online]. Available: https://proceedings.neurips.cc/paper/2021/file/fd45ebc1e1d76bc1fe0ba933e60e9957-Paper.pdf
- J. Jeong and J. Shin, “Consistency regularization for certified robustness of smoothed classifiers,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc., 2020.
- M. Lécuyer, V. Atlidakis, R. Geambasu, D. J. Hsu, and S. S. Jana, “Certified robustness to adversarial examples with differential privacy,” 2019 IEEE Symposium on Security and Privacy (SP), pp. 656–672, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:49431481
- L. Li, T. Xie, and B. Li, “Sok: Certified robustness for deep neural networks,” in 2023 IEEE Symposium on Security and Privacy (SP). Los Alamitos, CA, USA: IEEE Computer Society, may 2023, pp. 1289–1310. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SP46215.2023.10179303
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2019. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018. [Online]. Available: https://openreview.net/forum?id=rJzIBfZAb
- K. Mahmood, D. Gurevin, M. van Dijk, and P. Nguyen, “Beware the black-box: On the robustness of recent defenses to adversarial examples,” Entropy, vol. 23, p. 1359, 2021.
- W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion models for adversarial purification,” arXiv preprint arXiv:2205.07460, 2022.
- E. Raff, J. Sylvester, S. Forsyth, and M. McLean, “Barrage of random transforms for adversarially robust defense,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6521–6530.
- A. Raghunathan, J. Steinhardt, and P. Liang, “Certified defenses against adversarial examples,” ArXiv, vol. abs/1801.09344, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:11217889
- E. Rathbun, K. Mahmood, S. Ahmad, C. Ding, and M. Van Dijk, “Game theoretic mixed experts for combinational adversarial machine learning,” arXiv preprint arXiv:2211.14669, 2022.
- C. Sitawarin, Z. J. Golan-Strieb, and D. Wagner, “Demystifying the adversarial robustness of random transformation defenses,” in International Conference on Machine Learning. PMLR, 2022, pp. 20 232–20 252.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
- F. Tramer, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to adversarial example defenses,” Advances in Neural Information Processing Systems, vol. 33, pp. 1633–1645, 2020.
- Z. Wang, T. Pang, C. Du, M. Lin, W. Liu, and S. Yan, “Better diffusion models further improve adversarial training,” arXiv preprint arXiv:2302.04638, 2023.
- E. Wong and Z. Kolter, “Provable defenses against adversarial examples via the convex outer adversarial polytope,” in International conference on machine learning. PMLR, 2018, pp. 5286–5295.
- Q. Wu, H. Ye, Y. Gu, H. Zhang, L. Wang, and D. He, “Denoising masked autoencoders help robust classification,” in The Eleventh International Conference on Learning Representations, 2023.
- C. Xiao and C. Zheng, “One man’s trash is another man’s treasure: Resisting adversarial examples by adversarial examples,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 412–421.
- C. Xiao, Z. Chen, K. Jin, J. Wang, W. Nie, M. Liu, A. Anandkumar, B. Li, and D. Song, “Densepure: Understanding diffusion models for adversarial robustness,” in The Eleventh International Conference on Learning Representations, 2022.
- G. Yang, T. Duan, J. E. Hu, H. Salman, I. Razenshteyn, and J. Li, “Randomized smoothing of all shapes and sizes,” in International Conference on Machine Learning. PMLR, 2020, pp. 10 693–10 705.
- ——, “Randomized smoothing of all shapes and sizes,” in Proceedings of the 37th International Conference on Machine Learning, ser. ICML’20. JMLR.org, 2020.
- J. Yoon, S. J. Hwang, and J. Lee, “Adversarial purification with score-based generative models,” in International Conference on Machine Learning. PMLR, 2021, pp. 12 062–12 072.
- R. Zhai, C. Dan, D. He, H. Zhang, B. Gong, P. Ravikumar, C.-J. Hsieh, and L. Wang, “Macer: Attack-free and scalable robust training via maximizing certified radius,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=rJx1Na4Fwr
- H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International Conference on Machine Learning, 2019.
- J. Zhang, Z. Chen, H. Zhang, C. Xiao, and B. Li, “DiffSmooth: Certifiably robust learning via diffusion models and local smoothing,” in 32nd USENIX Security Symposium (USENIX Security 23). Anaheim, CA: USENIX Association, Aug. 2023, pp. 4787–4804. [Online]. Available: https://www.usenix.org/conference/usenixsecurity23/presentation/zhang-jiawei
- Y. Zhu, Z. Shen, Z. Zhao, S. Wang, X. Wang, X. Zhao, D. Shen, and Q. Wang, “Melo: Low-rank adaptation is better than fine-tuning for medical image diagnosis,” 2023.