MisGUIDE : Defense Against Data-Free Deep Learning Model Extraction (2403.18580v1)
Abstract: The rise of Machine Learning as a Service (MLaaS) has led to the widespread deployment of machine learning models trained on diverse datasets. These models are employed for predictive services through APIs, raising concerns about the security and confidentiality of the models due to emerging vulnerabilities in prediction APIs. Of particular concern are model cloning attacks, where individuals with limited data and no knowledge of the training dataset manage to replicate a victim model's functionality through black-box query access. This commonly entails generating adversarial queries to query the victim model, thereby creating a labeled dataset. This paper proposes "MisGUIDE", a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD. The first step employs a Vision Transformer-based framework to identify OOD queries, while the second step perturbs the response for such queries, introducing a probabilistic loss function to MisGUIDE the attackers. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed framework significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings.
- Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium (USENIX Security 18) (2018).
- Transformer-based extraction of deep image models. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) (2022), IEEE, pp. 320–336.
- Transformer-based extraction of deep image models. In 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P) (2022, June), IEEE, pp. 320–336.
- Ipguard: Protecting intellectual property of deep neural networks via fingerprinting the classification boundary. In ACM Asia Conference on Computer and Communications Security (2021).
- D-dae: Defense-penetrating model extraction attacks. In 2023 IEEE Symposium on Security and Privacy (SP) (2023, May), IEEE, pp. 382–399.
- Copycat cnn: Stealing knowledge by persuading confession with random non-labeled data. In International Joint Conference on Neural Networks (IJCNN) (2018).
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR) (2021).
- Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (2015), pp. 1322–1333.
- Tenet: A neural network model extraction attack in multi-core architecture. In Proceedings of the Great Lakes Symposium on VLSI 2021 (2021), ACM, pp. 21–26.
- Generative adversarial networks. Communications of the ACM 63, 11 (2020), 139–144.
- Prada: Protecting against DNN model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P) (2019), IEEE, pp. 512–527.
- Maze: Data-free model stealing attack using zeroth-order gradient estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 13814–13823.
- Defending against model stealing attacks with adaptive misinformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 770–778.
- Model extraction warning in mlaas paradigm. In Proceedings of the 34th Annual Computer Security Applications Conference (2018).
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS (2018).
- Defending against machine learning model stealing attacks using deceptive perturbations. AAAI Conference on Artificial Intelligence (2018).
- Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency (2019).
- Prediction poisoning: Towards defenses against dnn model stealing attacks. In The International Conference on Learning Representations (ICLR) (2020).
- A framework for the extraction of deep neural networks by leveraging public data. arXiv preprint arXiv:1905.09165 (2019).
- Stateful detection of model extraction attacks. arXiv preprint arXiv:2107.05166 (2021).
- Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (2017), pp. 506–519.
- Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972 (2021).
- Disguide: Disagreement-guided data-free model extraction. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence (2023).
- Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. In Network and Distributed Systems Security (NDSS) Symposium 2019 (San Diego, CA, USA, 24-27 February 2019).
- Towards data-free model stealing in a hard label setting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 15284–15293.
- Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP) (2017), IEEE, pp. 3–18.
- Modelguard: Information-theoretic defense against model extraction attacks. In 33rd USENIX Security Symposium (2024).
- Stealing machine learning models via prediction APIs. In 25th USENIX Security Symposium (USENIX Security 16) (2016), pp. 601–618.
- Data-free model extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021).
- Stealing hyperparameters in machine learning. In 2018 IEEE Symposium on Security and Privacy (SP) (2018), IEEE.
- Variational model inversion attacks. In Advances in Neural Information Processing Systems 34 (2021), pp. 9706–9719.
- Mahendra Gurve (1 paper)
- Sankar Behera (1 paper)
- Satyadev Ahlawat (1 paper)
- Yamuna Prasad (13 papers)