Extracting Cloud-based Model with Prior Knowledge (2306.04192v4)
Abstract: Machine Learning-as-a-Service, a pay-as-you-go business pattern, is widely accepted by third-party users and developers. However, the open inference APIs may be utilized by malicious customers to conduct model extraction attacks, i.e., attackers can replicate a cloud-based black-box model merely via querying malicious examples. Existing model extraction attacks mainly depend on the posterior knowledge (i.e., predictions of query samples) from Oracle. Thus, they either require high query overhead to simulate the decision boundary, or suffer from generalization errors and overfitting problems due to query budget limitations. To mitigate it, this work proposes an efficient model extraction attack based on prior knowledge for the first time. The insight is that prior knowledge of unlabeled proxy datasets is conducive to the search for the decision boundary (e.g., informative samples). Specifically, we leverage self-supervised learning including autoencoder and contrastive learning to pre-compile the prior knowledge of the proxy dataset into the feature extractor of the substitute model. Then we adopt entropy to measure and sample the most informative examples to query the target model. Our design leverages both prior and posterior knowledge to extract the model and thus eliminates generalizability errors and overfitting problems. We conduct extensive experiments on open APIs like Traffic Recognition, Flower Recognition, Moderation Recognition, and NSFW Recognition from real-world platforms, Azure and Clarifai. The experimental results demonstrate the effectiveness and efficiency of our attack. For example, our attack achieves 95.1% fidelity with merely 1.8K queries (cost 2.16$) on the NSFW Recognition API. Also, the adversarial examples generated with our substitute model have better transferability than others, which reveals that our scheme is more conducive to downstream attacks.
- Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 18). 1615–1631.
- High-dimensional dynamics of generalization error in neural networks. Neural Networks 132 (2020), 428–446.
- Extraction of complex dnn models: Real threat or boogeyman?. In Engineering Dependable and Secure Machine Learning Systems: Third International Workshop, EDSMLS 2020, New York City, NY, USA, February 7, 2020, Revised Selected Papers 3. Springer, 42–57.
- Dana H Ballard. 1987. Modular learning in neural networks.. In Aaai, Vol. 647. 279–284.
- Black-Box Ripper: Copying black-box models using generative evolutionary algorithms. Advances in Neural Information Processing Systems 33 (2020), 20120–20129.
- Side channel attacks for architecture extraction of neural networks. CAAI Transactions on Intelligence Technology 6, 1 (2021), 3–16.
- Exploring connections between active learning and model extraction. In Proceedings of the 29th USENIX Conference on Security Symposium. 1309–1326.
- Stealing deep reinforcement learning models for fun and profit. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security. 307–319.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
- Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 15750–15758.
- Clarifai. [n. d.]. Clarifai Computer Vision Services. [EB/OL]. https://www.clarifai.com/models/celebrity-face-recognition Accessed October 21, 2022.
- An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 215–223.
- Copycat cnn: Stealing knowledge by persuading confession with random non-labeled data. In 2018 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Increasing the cost of model extraction with calibrated proof of work. arXiv preprint arXiv:2201.09243 (2022).
- FacePlusPlus. [n. d.]. FacePlusPlus Body Detection. [EB/OL]. https://www.faceplusplus.com/body-detection/ Accessed October 21, 2022.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
- Google. [n. d.]. Google AutoML. [EB/OL]. https://cloud.google.com/products/ai Accessed October 21, 2022.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33 (2020), 21271–21284.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000–16009.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738.
- Using self-supervised learning can improve model robustness and uncertainty. Advances in neural information processing systems 32 (2019).
- High accuracy and high fidelity extraction of neural networks. In 29th USENIX security symposium (USENIX Security 20). 1345–1362.
- Entangled Watermarks as a Defense against Model Extraction.. In USENIX Security Symposium. 1937–1954.
- PRADA: protecting against DNN model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 512–527.
- Maze: Data-free model stealing attack using zeroth-order gradient estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13814–13823.
- Sanjay Kariyappa and Moinuddin K Qureshi. 2020. Defending against model stealing attacks with adaptive misinformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770–778.
- Model extraction warning in mlaas paradigm. In Proceedings of the 34th Annual Computer Security Applications Conference. 371–380.
- Learning multiple layers of features from tiny images. (2009).
- StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 2115–2128.
- Trojaning attack on neural networks. (2017).
- Membership inference attacks by exploiting loss trajectory. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 2085–2098.
- Understanding membership inferences on well-generalized learning models. arXiv preprint arXiv:1802.04889 (2018).
- Microsoft. [n. d.]. Azure Applied AI Services. [EB/OL]. https://azure.microsoft.com/en-us/product-categories/applied-ai-services/#overview Accessed October 21, 2022.
- Ishan Misra and Laurens van der Maaten. 2020. Self-supervised learning of pretext-invariant representations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6707–6717.
- MEGEX: Data-free model extraction attack against gradient-based explainable AI. arXiv preprint arXiv:2107.08909 (2021).
- Maria-Elena Nilsback and Andrew Zisserman. [n. d.]. 102 Category Flower Dataset. [EB/OL]. https://www.robots.ox.ac.uk/~vgg/data/flowers/102/ Accessed October 21, 2022.
- Boosting self-supervised learning via knowledge transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition. 9359–9367.
- Towards reverse-engineering black-box neural networks. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (2019), 121–144.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
- OpenAI. [n. d.]. OpenAI pricing. [EB/OL]. https://openai.com/pricing Accessed May 21, 2023.
- Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4954–4963.
- Stateful detection of model extraction attacks. arXiv preprint arXiv:2107.05166 (2021).
- A framework for the extraction of deep neural networks by leveraging public data. arXiv preprint arXiv:1905.09165 (2019).
- Activethief: Model extraction using active learning and unannotated public data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 865–872.
- Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security. 506–519.
- Steffano Psathas. 2022. Decreasing Model Stealing Querying for Black Box Adversarial Attacks. (2022).
- Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032 (2019).
- Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 1157–1174.
- Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. arXiv preprint arXiv:1806.01246 (2018).
- Can’t Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders. arXiv preprint arXiv:2201.07513 (2022).
- Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP). IEEE, 3–18.
- Exploring effective data for surrogate training towards black-box attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15355–15364.
- Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223 (2019).
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
- GTSRB team. [n. d.]. GTSRB dataset. [EB/OL]. https://benchmark.ini.rub.de/gtsrb_news.html Accessed October 21, 2022.
- Stealing machine learning models via prediction {{\{{APIs}}\}}. In 25th USENIX security symposium (USENIX Security 16). 601–618.
- Data-free model extraction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4771–4780.
- Self-supervised learning of motion capture. Advances in Neural Information Processing Systems 30 (2017).
- Binghui Wang and Neil Zhenqiang Gong. 2018. Stealing hyperparameters in machine learning. In 2018 IEEE symposium on security and privacy (SP). IEEE, 36–52.
- Delving into data: Effectively substitute training for black-box attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4761–4770.
- Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur) 53, 3 (2020), 1–34.
- Latent backdoor attacks on deep neural networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 2041–2055.
- CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples.. In NDSS.
- Es attack: Model stealing against deep neural networks without data hurdles. IEEE Transactions on Emerging Topics in Computational Intelligence 6, 5 (2022), 1258–1270.
- Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security. 159–172.
- Towards efficient data free black-box adversarial attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15115–15125.
- Seat: similarity encoder by adversarial training for detecting model extraction attack queries. In Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security. 37–48.
- On the (in) feasibility of attribute inference attacks on machine learning models. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 232–251.
- Bdpl: A boundary differentially private layer against machine learning model extraction attacks. In Computer Security–ESORICS 2019: 24th European Symposium on Research in Computer Security, Luxembourg, September 23–27, 2019, Proceedings, Part I 24. Springer, 66–83.
- Dast: Data-free substitute training for adversarial attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 234–243.