FDINet: Protecting against DNN Model Extraction via Feature Distortion Index (2306.11338v3)
Abstract: Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding adversaries from distributed extraction attacks. We conduct extensive experiments to evaluate FDINET against six state-of-the-art extraction attacks on four benchmark datasets and four popular model architectures. Empirical results demonstrate the following findings FDINET proves to be highly effective in detecting model extraction, achieving a 100% detection accuracy on DFME and DaST. FDINET is highly efficient, using just 50 queries to raise an extraction alarm with an average confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%. Additionally, it demonstrates the ability to detect two types of adaptive attacks.
- H. Zhang, Y. Li, Y. Huang, Y. Wen, J. Yin, and K. Guan, “Mlmodelci: An automatic cloud platform for efficient mlaas,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 4453–4456.
- “Machine learning as a service market size,” https://www.mordorintelligence.com/industry-reports/global-machine-learning-as-a-service-mlaas-market, accessed: 2023-05-30.
- F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machine learning models via prediction apis,” in 25th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 16), 2016, pp. 601–618.
- M. Jagielski, N. Carlini, D. Berthelot, A. Kurakin, and N. Papernot, “High accuracy and high fidelity extraction of neural networks,” in 29th USENIX Security Symposium (USENIX Security 20), 2020, pp. 1345–1362.
- Y. Shen, X. He, Y. Han, and Y. Zhang, “Model Stealing Attacks Against Inductive Graph Neural Networks,” in SP 2022 - 43rd IEEE Symposium on Security and Privacy.   San Francisco, United States: IEEE, May 2022, pp. 1–22.
- Z. Sha, X. He, N. Yu, M. Backes, and Y. Zhang, “Can’t steal? cont-steal! contrastive stealing attacks against image encoders,” arXiv preprint arXiv:2201.07513, 2022.
- K. Krishna, G. S. Tomar, A. P. Parikh, N. Papernot, and M. Iyyer, “Thieves on sesame street! model extraction of bert-based apis,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.
- X. Gong, Q. Wang, Y. Chen, W. Yang, and X. Jiang, “Model extraction attacks and defenses on cloud-based machine learning models,” IEEE Communications Magazine, vol. 58, no. 12, pp. 83–89, 2020.
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in 2017 IEEE Symposium on Security and Privacy (SP).   IEEE, 2017, pp. 3–18.
- N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia conference on computer and communications security, 2017, pp. 506–519.
- M. Zhou, J. Wu, Y. Liu, S. Liu, and C. Zhu, “Dast: Data-free substitute training for adversarial attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 234–243.
- W. Wang, B. Yin, T. Yao, L. Zhang, Y. Fu, S. Ding, J. Li, F. Huang, and X. Xue, “Delving into data: Effectively substitute training for black-box attack,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4761–4770.
- C. Ma, L. Chen, and J.-H. Yong, “Simulating unknown target models for query-efficient black-box attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 835–11 844.
- O. Bastani, C. Kim, and H. Bastani, “Interpreting blackbox models via model extraction,” arXiv preprint arXiv:1705.08504, 2017.
- D. Kazhdan, B. Dimanov, M. Jamnik, and P. Liò, “Meme: generating rnn model explanations via model extraction,” arXiv preprint arXiv:2012.06954, 2020.
- M. Kesarwani, B. Mukhoty, V. Arya, and S. Mehta, “Model extraction warning in mlaas paradigm,” in Proceedings of the 34th Annual Computer Security Applications Conference, 2018, pp. 371–380.
- M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “Prada: protecting against dnn model stealing attacks,” in 2019 IEEE European Symposium on Security and Privacy (EuroS&P).   IEEE, 2019, pp. 512–527.
- A. M. Sadeghzadeh, F. Dehghan, A. M. Sobhanian, and R. Jalili, “Hardness of samples is all you need: Protecting deep learning models using hardness of samples,” arXiv preprint arXiv:2106.11424, 2021.
- Z. Zhang, Y. Chen, and D. Wagner, “Seat: Similarity encoder by adversarial training for detecting model extraction attack queries,” in Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, 2021, pp. 37–48.
- J. Lee, S. Han, and S. Lee, “Model stealing defense against exploiting information leak through the interpretation of deep neural nets,” in Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), 2022.
- S. Kariyappa, A. Prakash, and M. K. Qureshi, “Protecting dnns from theft using an ensemble of diverse models,” in International Conference on Learning Representations, 2020.
- H. Zheng, Q. Ye, H. Hu, C. Fang, and J. Shi, “Bdpl: A boundary differentially private layer against machine learning model extraction attacks,” in European Symposium on Research in Computer Security.   Springer, 2019, pp. 66–83.
- ——, “Protecting decision boundary of machine learning model with differentially private perturbation,” IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 3, pp. 2007–2022, 2020.
- T. Orekondy, B. Schiele, and M. Fritz, “Prediction poisoning: Towards defenses against DNN model stealing attacks,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, 2020.
- A. Dziedzic, M. A. Kaleem, Y. S. Lu, and N. Papernot, “Increasing the cost of model extraction with calibrated proof of work,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022.   OpenReview.net, 2022.
- S. Kariyappa and M. K. Qureshi, “Defending against model stealing attacks with adaptive misinformation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 770–778.
- S. Pal, Y. Gupta, A. Kanade, and S. Shevade, “Stateful detection of model extraction attacks,” arXiv preprint arXiv:2107.05166, 2021.
- T. Orekondy, B. Schiele, and M. Fritz, “Knockoff nets: Stealing functionality of black-box models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4954–4963.
- S. Pal, Y. Gupta, A. Shukla, A. Kanade, S. Shevade, and V. Ganapathy, “Activethief: Model extraction using active learning and unannotated public data,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, 2020, pp. 865–872.
- V. Chandrasekaran, K. Chaudhuri, I. Giacomelli, S. Jha, and S. Yan, “Exploring connections between active learning and model extraction,” in 29th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 20), 2020, pp. 1309–1326.
- J. R. Correia-Silva, R. F. Berriel, C. Badue, A. F. de Souza, and T. Oliveira-Santos, “Copycat cnn: Stealing knowledge by persuading confession with random non-labeled data,” in 2018 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2018, pp. 1–8.
- X. He, J. Jia, M. Backes, N. Z. Gong, and Y. Zhang, “Stealing links from graph neural networks.” in USENIX Security Symposium, 2021, pp. 2669–2686.
- Y. Wang, H. Qian, and C. Miao, “Dualcf: Efficient model extraction attack from counterfactual explanations,” in 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1318–1329.
- H. Yu, K. Yang, T. Zhang, Y.-Y. Tsai, T.-Y. Ho, and Y. Jin, “Cloudleak: Large-scale deep learning models stealing through adversarial examples,” in Proceedings of Network and Distributed Systems Security Symposium (NDSS), 2020.
- A. Barbalau, A. Cosma, R. T. Ionescu, and M. Popescu, “Black-box ripper: Copying black-box models using generative evolutionary algorithms,” 2020.
- J.-B. Truong, P. Maini, R. J. Walls, and N. Papernot, “Data-free model extraction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
- S. Sanyal, S. Addepalli, and R. V. Babu, “Towards data-free model stealing in a hard label setting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 284–15 293.
- T. Miura, S. Hasegawa, and T. Shibahara, “Megex: Data-free model extraction attack against gradient-based explainable ai,” arXiv preprint arXiv:2107.08909, 2021.
- S. Kariyappa, A. Prakash, and M. K. Qureshi, “Maze: Data-free model stealing attack using zeroth-order gradient estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 814–13 823.
- X. Gong, Y. Chen, W. Yang, G. Mei, and Q. Wang, “Inversenet: Augmenting model extraction attacks with training data inversion,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, Z. Zhou, Ed.   ijcai.org, 2021, pp. 2439–2447.
- Z. Ma, X. Liu, Y. Liu, X. Liu, Z. Qin, and K. Ren, “Divtheft: An ensemble model stealing attack by divide-and-conquer,” IEEE Transactions on Dependable and Secure Computing, 2023.
- S. Szyller, B. G. Atli, S. Marchal, and N. Asokan, “Dawn: Dynamic adversarial watermarking of neural networks,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 4417–4425.
- H. Jia, C. A. Choquette-Choo, V. Chandrasekaran, and N. Papernot, “Entangled watermarks as a defense against model extraction.” in USENIX Security Symposium, 2021, pp. 1937–1954.
- N. Lukas, Y. Zhang, and F. Kerschbaum, “Deep neural network fingerprinting by conferrable adversarial examples,” 2021.
- Y. Chen, C. Shen, C. Wang, and Y. Zhang, “Teacher model fingerprinting attacks against transfer learning,” in 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 3593–3610.
- X. Pan, Y. Yan, M. Zhang, and M. Yang, “Metav: A meta-verifier approach to task-agnostic model fingerprinting,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 1327–1336.
- P. Maini, M. Yaghini, and N. Papernot, “Dataset inference: Ownership resolution in machine learning,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021.   OpenReview.net, 2021.
- A. Dziedzic, H. Duan, M. A. Kaleem, N. Dhawan, J. Guan, Y. Cattan, F. Boenisch, and N. Papernot, “Dataset inference for self-supervised models,” arXiv preprint arXiv:2209.09024, 2022.
- L. Zhu, Y. Li, X. Jia, Y. Jiang, S.-T. Xia, and X. Cao, “Defending against model stealing via verifying embedded external features,” in ICML 2021 Workshop on Adversarial Machine Learning, 2021.
- Y. Li, L. Zhu, X. Jia, Y. Jiang, S.-T. Xia, and X. Cao, “Defending against model stealing via verifying embedded external features,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1464–1472.
- E. Quiring, D. Arp, and K. Rieck, “Forgotten siblings: Unifying attacks on machine learning and digital watermarking,” in 2018 IEEE European Symposium on Security and Privacy (EuroS&P).   IEEE, 2018, pp. 488–502.
- T. Lee, B. Edwards, I. Molloy, and D. Su, “Defending against neural network model stealing attacks using deceptive perturbations,” in 2019 IEEE Security and Privacy Workshops (SPW).   IEEE, 2019, pp. 43–49.
- J. Chen, C. Wu, S. Shen, X. Zhang, and J. Chen, “Das-ast: Defending against model stealing attacks based on adaptive softmax transformation,” in International Conference on Information Security and Cryptology.   Springer, 2020, pp. 21–36.
- A. Dziedzic, M. A. Kaleem, Y. S. Lu, and N. Papernot, “Increasing the cost of model extraction with calibrated proof of work,” in 10th International Conference on Learning Representations, ICLR, 2022.
- M. Mazeika, B. Li, and D. Forsyth, “How to steer your adversary: Targeted and efficient model stealing defenses with gradient redirection,” in International Conference on Machine Learning.   PMLR, 2022, pp. 15 241–15 254.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition,” Neural networks, vol. 32, pp. 323–332, 2012.
- Z. Liu, P. Luo, X. Wang, and X. Tang, “Large-scale celebfaces attributes (celeba) dataset,” Retrieved August, vol. 15, no. 2018, p. 11, 2018.
- N. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti et al., “Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic),” arXiv preprint arXiv:1902.03368, 2019.
- L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “Cinic-10 is not imagenet or cifar-10,” arXiv preprint arXiv:1810.03505, 2018.
- G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller, “Labeled faces in the wild: A database forstudying face recognition in unconstrained environments,” in Workshop on faces in’Real-Life’Images: detection, alignment, and recognition, 2008.
- M. Combalia, N. C. Codella, V. Rotemberg, B. Helba, V. Vilaplana, O. Reiter, C. Carrera, A. Barreiro, A. C. Halpern, S. Puig et al., “Bcn20000: Dermoscopic lesions in the wild,” arXiv preprint arXiv:1908.02288, 2019.
- S. Zanella-Beguelin, S. Tople, A. Paverd, and B. Köpf, “Grey-box extraction of natural language models,” in International Conference on Machine Learning.   PMLR, 2021, pp. 12 278–12 286.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.