Mixing Classifiers to Alleviate the Accuracy-Robustness Trade-Off (2311.15165v2)
Abstract: Deep neural classifiers have recently found tremendous success in data-driven control systems. However, existing models suffer from a trade-off between accuracy and adversarial robustness. This limitation must be overcome in the control of safety-critical systems that require both high performance and rigorous robustness guarantees. In this work, we develop classifiers that simultaneously inherit high robustness from robust models and high accuracy from standard models. Specifically, we propose a theoretically motivated formulation that mixes the output probabilities of a standard neural network and a robust neural network. Both base classifiers are pre-trained, and thus our method does not require additional training. Our numerical experiments verify that the mixed classifier noticeably improves the accuracy-robustness trade-off and identify the confidence property of the robust base classifier as the key leverage of this more benign trade-off. Our theoretical results prove that under mild assumptions, when the robustness of the robust base model is certifiable, no alteration or attack within a closed-form $\ell_p$ radius on an input can result in the misclassification of the mixed classifier.
- S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.
- M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang, et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
- B. Wu, F. Iandola, P. H. Jin, and K. Keutzer, “SqueezeDet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in International Conference on Learning Representations, 2014.
- A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015.
- S. H. Huang, N. Papernot, I. J. Goodfellow, Y. Duan, and P. Abbeel, “Adversarial attacks on neural network policies,” in International Conference on Learning Representations, 2017.
- K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao, A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks on deep learning visual classification,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018.
- A. Liu, X. Liu, J. Fan, Y. Ma, A. Zhang, H. Xie, and D. Tao, “Perceptual-sensitive GAN for generating adversarial patches,” in The AAAI Conference on Artificial Intelligence, 2019.
- A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in International Conference on Learning Representations, 2017.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in International Conference on Learning Representations, 2015.
- Y. Bai, T. Gautam, Y. Gai, and S. Sojoudi, “Practical convex formulation of robust one-hidden-layer neural network training,” American Control Conference, 2022.
- Y. Bai, T. Gautam, and S. Sojoudi, “Efficient global optimization of two-layer ReLU networks: Quadratic-time algorithms and adversarial training,” SIAM Journal on Mathematics of Data Science, 2022.
- H. Zheng, Z. Zhang, J. Gu, H. Lee, and A. Prakash, “Efficient adversarial training with transferable adversarial examples,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- B. Anderson, Z. Ma, J. Li, and S. Sojoudi, “Tightened convex relaxations for neural network robustness certification,” in IEEE Conference on Decision and Control, 2020.
- Z. Ma and S. Sojoudi, “A sequential framework towards an exact SDP verification of neural networks,” in International Conference on Data Science and Advanced Analytics, 2021.
- B. G. Anderson and S. Sojoudi, “Data-driven certification of neural networks with random input noise,” IEEE Transactions on Control of Network Systems, 2022.
- J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness via randomized smoothing,” in International Conference on Machine Learning, 2019.
- B. Li, C. Chen, W. Wang, and L. Carin, “Certified adversarial robustness with additive noise,” in Advances in Neural Information Processing Systems, 2019.
- S. Pfrommer, B. G. Anderson, and S. Sojoudi, “Projected randomized smoothing for certified adversarial robustness,” Transactions on Machine Learning Research, 2023.
- A. Kumar, A. Levine, and S. Feizi, “Policy smoothing for provably robust reinforcement learning,” in International Conference on Learning Representations, 2022.
- F. Wu, L. Li, Z. Huang, Y. Vorobeychik, D. Zhao, and B. Li, “CROP: Certifying robust policies for reinforcement learning through functional smoothing,” in International Conference on Learning Representations, 2022.
- B. G. Anderson and S. Sojoudi, “Certified robustness via locally biased randomized smoothing,” in Learning for Dynamics and Control Conference, 2022.
- D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, “Robustness may be at odds with accuracy,” in International Conference on Learning Representations, 2019.
- H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International Conference on Machine Learning, 2019.
- Y. Yang, C. Rashtchian, H. Zhang, R. R. Salakhutdinov, and K. Chaudhuri, “A closer look at accuracy vs. robustness,” in Annual Conference on Neural Information Processing Systems, 2020.
- A. Lamb, V. Verma, J. Kannala, and Y. Bengio, “Interpolated adversarial training: Achieving robust neural networks without sacrificing too much accuracy,” in ACM Workshop on Artificial Intelligence and Security, 2019.
- A. Raghunathan, S. M. Xie, F. Yang, J. C. Duchi, and P. Liang, “Understanding and mitigating the tradeoff between robustness and accuracy,” in International Conference on Machine Learning, 2020.
- H. Zhang and J. Wang, “Defense against adversarial attacks using feature scattering-based adversarial training,” in Annual Conference on Neural Information Processing Systems, 2019.
- F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. McDaniel, “Ensemble adversarial training: Attacks and defenses,” in International Conference on Learning Representations, 2018.
- Y. Balaji, T. Goldstein, and J. Hoffman, “Instance adaptive adversarial training: Improved accuracy tradeoffs in neural nets,” arXiv preprint arXiv:1910.08051, 2019.
- T. Chen, S. Liu, S. Chang, Y. Cheng, L. Amini, and Z. Wang, “Adversarial robustness: From self-supervised pre-training to fine-tuning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020.
- L. Fan, S. Liu, P.-Y. Chen, G. Zhang, and C. Gan, “When does contrastive learning preserve adversarial robustness from pretraining to finetuning?” in Advances in Neural Information Processing Systems, 2021.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018.
- N. Carlini and D. A. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE Symposium on Security and Privacy, 2017.
- A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in International Conference on Machine Learning, 2018.
- N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in ACM Asia Conference on Computer and Communications Security, 2017.
- F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in International Conference on Machine Learning, 2020.
- F. Tramèr, N. Carlini, W. Brendel, and A. Madry, “On adaptive attacks to adversarial example defenses,” in Advances in Neural Information Processing Systems, 2020.
- L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry, “Adversarially robust generalization requires more data,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- S.-A. Rebuffi, S. Gowal, D. A. Calian, F. Stimberg, O. Wiles, and T. Mann, “Fixing data augmentation to improve adversarial robustness,” arXiv preprint arXiv:2103.01946, 2021.
- S. Gowal, S.-A. Rebuffi, O. Wiles, F. Stimberg, D. A. Calian, and T. Mann, “Improving robustness using generated data,” arXiv preprint arXiv:2110.09468, 2021.
- V. Sehwag, S. Mahloujifar, T. Handina, S. Dai, C. Xiang, M. Chiang, and P. Mittal, “Robust learning meets generative models: Can proxy distributions improve adversarial robustness?” in International Conference on Learning Representations, 2022.
- X. Jia, Y. Zhang, B. Wu, K. Ma, J. Wang, and X. Cao, “LAS-AT: Adversarial training with learnable attack strategy,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022.
- A. Shafahi, M. Najibi, M. A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein, “Adversarial training for free!” Advances in Neural Information Processing Systems, 2019.
- T. Pang, M. Lin, X. Yang, J. Zhu, and S. Yan, “Robustness and accuracy could be reconcilable by (proper) definition,” arXiv preprint arXiv:2202.10103, 2022.
- T. Hu, T. Chen, H. Wang, and Z. Wang, “Triple wins: Boosting accuracy, robustness and efficiency together by enabling input-adaptive inference,” in International Conference on Learning Representations, 2020.
- Y. Zheng, R. Zhang, and Y. Mao, “Regularizing neural networks via adversarial model perturbation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021.
- X. Liu, M. Cheng, H. Zhang, and C.-J. Hsieh, “Towards robust neural networks via random self-ensemble,” in European Conference on Computer Vision, 2018.
- T. Pang, K. Xu, C. Du, N. Chen, and J. Zhu, “Improving adversarial robustness via promoting ensemble diversity,” in International Conference on Machine Learning, 2019.
- M. Alam, S. Datta, D. Mukhopadhyay, A. Mondal, and P. P. Chakrabarti, “Resisting adversarial attacks in deep neural networks using diverse decision boundaries,” arXiv preprint arXiv:2208.08697, 2022.
- A. Krizhevsky, “Learning multiple layers of features from tiny images,” 2012. [Online]. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
- H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and G. Yang, “Provably robust deep learning via adversarially trained smoothed classifiers,” Advances in Neural Information Processing Systems, 2019.
- M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of Lipschitz constants for deep neural networks,” in Advances in Neural Information Processing Systems, 2019.
- M. Hein and M. Andriushchenko, “Formal guarantees on the robustness of a classifier against adversarial manipulation,” in Advances in Neural Information Processing Systems, 2017.
- A. Levine, S. Singla, and S. Feizi, “Certifiably robust interpretation in deep learning,” arXiv preprint arXiv:1905.12105, 2019.
- Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022.