Towards Scalable and Robust Model Versioning (2401.09574v2)
Abstract: As the deployment of deep learning models continues to expand across industries, the threat of malicious incursions aimed at gaining access to these deployed models is on the rise. Should an attacker gain access to a deployed model, whether through server breaches, insider attacks, or model inversion techniques, they can then construct white-box adversarial attacks to manipulate the model's classification outcomes, thereby posing significant risks to organizations that rely on these models for critical tasks. Model owners need mechanisms to protect themselves against such losses without the necessity of acquiring fresh training data - a process that typically demands substantial investments in time and capital. In this paper, we explore the feasibility of generating multiple versions of a model that possess different attack properties, without acquiring new training data or changing model architecture. The model owner can deploy one version at a time and replace a leaked version immediately with a new version. The newly deployed model version can resist adversarial attacks generated leveraging white-box access to one or all previously leaked versions. We show theoretically that this can be accomplished by incorporating parameterized hidden distributions into the model training data, forcing the model to learn task-irrelevant features uniquely defined by the chosen data. Additionally, optimal choices of hidden distributions can produce a sequence of model versions capable of resisting compound transferability attacks over time. Leveraging our analytical insights, we design and implement a practical model versioning method for DNN classifiers, which leads to significant robustness improvements over existing methods. We believe our work presents a promising direction for safeguarding DNN services beyond their initial deployment.
- S. G. Finlayson et al., “Adversarial attacks on medical machine learning,” Science, no. 6433, 2019.
- X. Ma et al., “Understanding adversarial attacks on deep learning based medical image analysis systems,” Pattern Recognition, p. 107332, 2021.
- M. B. Rahman, H. A. Mustafa, and M. D. Hossain, “Towards evaluating robustness of violence detection in videos using cross-domain transferability,” Journal of Information Security and Applications, 2023.
- A. N. Bhagoji, W. He, B. Li, and D. Song, “Practical black-box attacks on deep neural networks using efficient query mechanisms,” in Proc. of ECCV, 2018.
- P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions,” Scientific data, 2018.
- A. Demontis et al., “Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks,” in Proc. of USENIX Security, 2019.
- Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” in Proc. of ICLR, 2017.
- H. Yang et al., “Dverge: diversifying vulnerabilities for enhanced robust generation of ensembles,” Proc. of NeurIPS, 2020.
- Z. Yang et al., “Trs: Transferability reduced ensemble via promoting gradient diversity and model smoothness,” Proc. of NeurIPS, 2021.
- H. Dbouk and N. Shanbhag, “Adversarial vulnerability of randomized ensembles,” in Proc. of ICML, 2022.
- S. Shan, W. Ding, E. Wenger, H. Zheng, and B. Y. Zhao, “Post-breach recovery: Protection against white-box adversarial examples for leaked dnn models,” in Proc. of CCS, 2022.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. of ICLR, 2014.
- C. Szegedy et al., “Intriguing properties of neural networks,” in Proc. of ICLR, 2014.
- W. Wu, Y. Su, M. R. Lyu, and I. King, “Improving the transferability of adversarial samples with adversarial transformations,” in Proc. of CVPR, 2021.
- F. Suya, J. Chi, D. Evans, and Y. Tian, “Hybrid batch attacks: Finding black-box adversarial examples with limited queries,” in Proc. of USENIX Security, 2020.
- N. Inkawhich, K. J. Liang, L. Carin, and Y. Chen, “Transferable perturbations of deep feature distributions,” in Proc. of ICLR, 2020.
- J. Springer, M. Mitchell, and G. Kenyon, “A little robustness goes a long way: Leveraging robust features for targeted transfer attacks,” Proc. of NeurIPS, 2021.
- C. Cianfarani et al., “Understanding robust learning through the lens of representation similarities,” Proc. of NeurIPS, vol. 35, 2022.
- C. Wiedeman and G. Wang, “Disrupting adversarial transferability in deep neural networks,” Patterns, 2022.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
- GoogleCloud, “Mlops: Continuous delivery and automation pipelines in machine learning,” https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning, 2023.
- Stable Diffusion, “Hugging face stable diffusion,” https://huggingface.co/CompVis/stable-diffusion, accessed: 2023-24-01.
- T. Xu et al., “Deep entity classification: Abusive account detection for online social networks,” in Proc. of USENIX Security, 2021.
- S. Shan, A. N. Bhagoji, H. Zheng, and B. Y. Zhao, “Poison forensics: Traceback of data poisoning attacks in neural networks,” in Proc. of USENIX Security, 2022.
- A. Shafahi et al., “Poison frogs! targeted clean-label poisoning attacks on neural networks,” arXiv preprint arXiv:1804.00792, 2018.
- I. Goodfellow et al., “Generative adversarial nets,” Proc. of NeurIPS, 2014.
- T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” Proc. of ICLR, 2017.
- A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009.
- “https://www.cs.tau.ac.il/~wolf/ytfaces/,” YouTube Faces DB.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. of CVPR, 2016.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. of CVPR, 2017.
- A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Artificial intelligence safety and security. Chapman and Hall/CRC, 2018.
- N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Proc. of IEEE S&P, 2017.
- P.-Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.-J. Hsieh, “Ead: elastic-net attacks to deep neural networks via adversarial examples,” in Proc. of AAAI, 2018.
- F. Tramèr et al., “Ensemble adversarial training: Attacks and defenses,” arXiv preprint arXiv:1705.07204, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.