Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MEA-Defender: A Robust Watermark against Model Extraction Attack (2401.15239v1)

Published 26 Jan 2024 in cs.CR and cs.LG

Abstract: Recently, numerous highly-valuable Deep Neural Networks (DNNs) have been trained using deep learning algorithms. To protect the Intellectual Property (IP) of the original owners over such DNN models, backdoor-based watermarks have been extensively studied. However, most of such watermarks fail upon model extraction attack, which utilizes input samples to query the target model and obtains the corresponding outputs, thus training a substitute model using such input-output pairs. In this paper, we propose a novel watermark to protect IP of DNN models against model extraction, named MEA-Defender. In particular, we obtain the watermark by combining two samples from two source classes in the input domain and design a watermark loss function that makes the output domain of the watermark within that of the main task samples. Since both the input domain and the output domain of our watermark are indispensable parts of those of the main task samples, the watermark will be extracted into the stolen model along with the main task during model extraction. We conduct extensive experiments on four model extraction attacks, using five datasets and six models trained based on supervised learning and self-supervised learning algorithms. The experimental results demonstrate that MEA-Defender is highly robust against different model extraction attacks, and various watermark removal/detection approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
  2. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in ICML.   PMLR, 2020.
  3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
  4. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, 1997.
  5. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv:1810.04805, 2018.
  6. W. Dai, C. Dai, S. Qu, J. Li, and S. Das, “Very deep convolutional neural networks for raw waveforms,” in ICASSP.   IEEE, 2017.
  7. N. Elmrabit, S.-H. Yang, and L. Yang, “Insider threats in information security categories and approaches,” in ICAC.   IEEE, 2015.
  8. M. Button, “Economic and industrial espionage,” pp. 1–5, 2020.
  9. G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv:1503.02531, vol. 2, no. 7, 2015.
  10. N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in ASIACCS, 2017.
  11. F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machine learning models via prediction {{\{{APIs}}\}},” in USENIX security symposium, 2016.
  12. T. Orekondy, B. Schiele, and M. Fritz, “Knockoff nets: Stealing functionality of black-box models,” in CVPR, 2019.
  13. Y. Liu, J. Jia, H. Liu, and N. Z. Gong, “Stolenencoder: stealing pre-trained encoders in self-supervised learning,” in CCS, 2022.
  14. V. Duddu, D. Samanta, D. V. Rao, and V. E. Balas, “Stealing neural networks via timing side channels,” arXiv:1812.11720, 2018.
  15. Y. Adi, C. Baum, M. Cisse, B. Pinkas, and J. Keshet, “Turning your weakness into a strength: Watermarking deep neural networks by backdooring,” in USENIX Security Symposium, 2018.
  16. R. Namba and J. Sakuma, “Robust watermarking of neural network with exponential weighting,” in ASIACCS, 2019.
  17. P. Lv, P. Li, S. Zhu, S. Zhang, K. Chen et al., “Ssl-wm: A black-box watermarking approach for encoders pre-trained by self-supervised learning,” arXiv:2209.03563, 2022.
  18. H. Jia, C. A. Choquette-Choo, V. Chandrasekaran, and N. Papernot, “Entangled watermarks as a defense against model extraction,” in USENIX Security Symposium, 2021.
  19. S. Szyller, B. G. Atli, S. Marchal, and N. Asokan, “Dawn: Dynamic adversarial watermarking of neural networks,” in ACM International Conference on Multimedia, 2021.
  20. Y. Uchida, Y. Nagai, S. Sakazawa, and S. Satoh, “Embedding watermarks into deep neural networks,” in ICMR, 2017.
  21. B. D. Rouhani, H. Chen, and F. Koushanfar, “Deepsigns: A generic watermarking framework for ip protection of deep learning models,” arXiv:1804.00750, 2018.
  22. P. Lv, P. Li, S. Zhang, K. Chen, R. Liang, H. Ma, Y. Zhao, and Y. Li, “A robustness-assured white-box watermark in neural networks,” IEEE TDSC, 2023.
  23. J. Zhang, Z. Gu, J. Jang, H. Wu, M. P. Stoecklin, H. Huang, and I. Molloy, “Protecting intellectual property of deep neural networks with watermarking,” in ASIACCS, 2018.
  24. T. Cong, X. He, and Y. Zhang, “Sslguard: A watermarking scheme for self-supervised learning pre-trained encoders,” arXiv:2201.11692, 2022.
  25. Y. Li, L. Zhu, X. Jia, Y. Jiang, S.-T. Xia, and X. Cao, “Defending against model stealing via verifying embedded external features.”   AAAI, 2022.
  26. N. Lukas, E. Jiang, X. Li, and F. Kerschbaum, “Sok: How robust is image classification deep neural network watermarking?” in IEEE Symposium on Security and Privacy.   IEEE, 2022.
  27. S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” in NeurIPS, 2015.
  28. B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao, “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” in SP.   IEEE, 2019.
  29. Y. Liu, W.-C. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang, “Abs: Scanning neural networks for back-doors by artificial brain stimulation,” in ACM CCS, 2019.
  30. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in CVPR, 2020.
  31. J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, and Doersch, “Bootstrap your own latent-a new approach to self-supervised learning,” NeurIPS, 2020.
  32. H. Yu, K. Yang, T. Zhang, Y.-Y. Tsai, T.-Y. Ho, and Y. Jin, “Cloudleak: Large-scale deep learning models stealing through adversarial examples.” in NDSS, 2020.
  33. Z. Sha, X. He, N. Yu, M. Backes, and Y. Zhang, “Can’t steal? cont-steal! contrastive stealing attacks against image encoders,” in CVPR, 2023.
  34. E. Bagdasaryan and V. Shmatikov, “Blind backdoors in deep learning models,” in USENIX Security Symposium, 2021.
  35. J. Lin, L. Xu, Y. Liu, and X. Zhang, “Composite backdoor attack for deep neural network by mixing existing benign features,” in ACM CCS, 2020.
  36. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed et al., “Going deeper with convolutions,” in CVPR, 2015.
  37. M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “Prada: protecting against dnn model stealing attacks,” in EuroS&P.   IEEE, 2019.
  38. S. Kariyappa, A. Prakash, and M. K. Qureshi, “Maze: Data-free model stealing attack using zeroth-order gradient estimation,” in CVPR, 2021.
  39. K. Krishna, G. S. Tomar, A. P. Parikh, N. Papernot, and M. Iyyer, “Thieves on sesame street! model extraction of bert-based apis,” arXiv:1910.12366, 2019.
  40. Z. Li, C. Hu, Y. Zhang, and S. Guo, “How to prove your model belongs to you: A blind-watermark based framework to protect intellectual property of dnn,” in ACSAC, 2019.
  41. M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in ACM SIGMOD international conference on Management of data, 2000.
  42. K. P. F.R.S., “Liii. on lines and planes of closest fit to systems of points in space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, no. 11, pp. 559–572, 1901.
  43. Y. Wang, J. Li, H. Liu, Y. Wang, Y. Wu, F. Huang, and R. Ji, “Black-box dissector: Towards erasing-based hard-label model stealing attack,” in ECCV.   Springer, 2022.
  44. H. Chen, B. D. Rouhani, C. Fu, J. Zhao, and F. Koushanfar, “Deepmarks: A secure fingerprinting framework for digital rights management of deep learning models,” in ICMR, 2019.
  45. X. H., R. K., and V. R., “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv:1708.07747, 2017.
  46. K. A. and H. G., “Learning multiple layers of features from tiny images,” 2009.
  47. L. Wolf, T. Hassner, and I. Maoz, “Face recognition in unconstrained videos with matched background similarity,” in CVPR.   IEEE, 2011.
  48. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” NeurIPS, 2015.
  49. P. Warden, “Speech commands: A dataset for limited-vocabulary speech recognition,” arXiv:1804.03209, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Peizhuo Lv (10 papers)
  2. Hualong Ma (4 papers)
  3. Kai Chen (512 papers)
  4. Jiachen Zhou (8 papers)
  5. Shengzhi Zhang (18 papers)
  6. Ruigang Liang (9 papers)
  7. Shenchen Zhu (4 papers)
  8. Pan Li (165 papers)
  9. Yingjun Zhang (6 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.