Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model (2307.08424v3)
Abstract: Model inversion attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models, posing a privacy threat. MIAs primarily focus on the white-box scenario where attackers have full access to the model's structure and parameters. However, practical applications are usually in black-box scenarios or label-only scenarios, i.e., the attackers can only obtain the output confidence vectors or labels by accessing the model. Therefore, the attack models in existing MIAs are difficult to effectively train with the knowledge of the target model, resulting in sub-optimal attacks. To the best of our knowledge, we pioneer the research of a powerful and practical attack model in the label-only scenario. In this paper, we develop a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label from the training set. Two techniques are introduced: selecting an auxiliary dataset relevant to the target model task and using predicted labels as conditions to guide training CDM; and inputting target label, pre-defined guidance strength, and random noise into the trained attack model to generate and correct multiple results for final selection. This method is evaluated using Learned Perceptual Image Patch Similarity as a new metric and as a judgment basis for deciding the values of hyper-parameters. Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
- C. Song, T. Ristenpart, and V. Shmatikov, “Machine learning models that remember too much,” in Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security, 2017, pp. 587–601.
- M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
- P. Baldi, “Autoencoders, unsupervised learning, and deep architectures,” in Proceedings of ICML workshop on unsupervised and transfer learning. JMLR Workshop and Conference Proceedings, 2012, pp. 37–49.
- H.-G. Beyer and H.-P. Schwefel, “Evolution strategies–a comprehensive introduction,” Natural computing, vol. 1, pp. 3–52, 2002.
- I. W. P. Consortium, “Estimation of the warfarin dose with clinical and pharmacogenetic data,” New England Journal of Medicine, vol. 360, no. 8, pp. 753–764, 2009.
- J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, pp. 81–106, 1986.
- F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing machine learning models via prediction apis.” in USENIX security symposium, vol. 16, 2016, pp. 601–618.
- N. Papernot, P. McDaniel, A. Sinha, and M. P. Wellman, “Sok: Security and privacy in machine learning,” in 2018 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2018, pp. 399–414.
- Y. He, G. Meng, K. Chen, X. Hu, and J. He, “Towards security threats of deep learning systems: A survey,” IEEE Transactions on Software Engineering, vol. 48, no. 5, pp. 1743–1770, 2020.
- M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart, “Privacy in pharmacogenetics: An end-to-end case study of personalized warfarin dosing,” in 23rd {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 14), 2014, pp. 17–32.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- Z. Yang, L. Wang, D. Yang, J. Wan, Z. Zhao, E.-C. Chang, F. Zhang, and K. Ren, “Purifier: defending data inference attacks via transforming confidence scores,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, 2023, pp. 10 871–10 879.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- Y. LeCun, “The mnist database of handwritten digits,” http://yann. lecun. com/exdb/mnist/, 1998.
- M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015, pp. 1322–1333.
- J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in International conference on machine learning. PMLR, 2015, pp. 2256–2265.
- P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
- C. Jarzynski, “Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach,” Physical Review E, vol. 56, no. 5, p. 5018, 1997.
- Z. Yang, J. Zhang, E.-C. Chang, and Z. Liang, “Neural network inversion in adversarial setting via background knowledge alignment,” in Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, 2019, pp. 225–240.
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in 2017 IEEE symposium on security and privacy (SP). IEEE, 2017, pp. 3–18.
- Y. Zhang, R. Jia, H. Pei, W. Wang, B. Li, and D. Song, “The secret revealer: Generative model-inversion attacks against deep neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 253–261.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, vol. 30, 2017, pp. 6000–6010.
- S. Chen, M. Kahla, R. Jia, and G.-J. Qi, “Knowledge-enriched distributional model inversion attacks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 16 178–16 187.
- X. Yuan, K. Chen, J. Zhang, W. Zhang, N. Yu, and Y. Zhang, “Pseudo label-guided model inversion attack via conditional generative adversarial network,” arXiv preprint arXiv:2302.09814, 2023.
- L. Struppek, D. Hintersdorf, and K. Kersting, “Be careful what you smooth for: Label smoothing can be a privacy shield but also a catalyst for model inversion attacks,” arXiv preprint arXiv:2310.06549, 2023.
- X. Gong, Z. Wang, Y. Chen, Q. Wang, C. Wang, and C. Shen, “Netguard: Protecting commercial web apis from model inversion attacks using gan-generated fake samples,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 2045–2053.
- S. Chen, F. Kang, N. Abhyankar, M. Jin, and R. Jia, “Data-centric defense: Shaping loss landscape with augmentations to counter model inversion,” in Proceedings of the 40th International Conference on Machine Learning. PMLR, 2023.
- T. Zhu, D. Ye, S. Zhou, B. Liu, and W. Zhou, “Label-only model inversion attacks: Attack with the least information,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 991–1005, 2022.
- X. Peng, F. Liu, J. Zhang, L. Lan, J. Ye, T. Liu, and B. Han, “Bilateral dependency optimization: Defending against model-inversion attacks,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 1358–1367.
- T. Wang, Y. Zhang, and R. Jia, “Improving robustness to model inversion attacks via mutual information regularization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, 2021, pp. 11 666–11 673.
- M. Kahla, S. Chen, H. A. Just, and R. Jia, “Label-only model inversion attacks via boundary repulsion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 15 045–15 053.
- A. Dionysiou, V. Vassiliades, and E. Athanasopoulos, “Exploring model inversion attacks in the black-box setting,” Proceedings on Privacy Enhancing Technologies, vol. 1, pp. 190–206, 2023.
- S. Yoshimura, K. Nakamura, N. Nitta, and N. Babaguchi, “Model inversion attack against a face recognition system in a black-box setting,” in 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2021, pp. 1800–1807.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- D. Ye, H. Chen, S. Zhou, T. Zhu, W. Zhou, and S. Ji, “Model inversion attack against transfer learning: Inverting a model without accessing it,” arXiv preprint arXiv:2203.06570, 2022.
- G. Han, J. Choi, H. Lee, and J. Kim, “Reinforcement learning-based black-box model inversion attacks,” arXiv preprint arXiv:2304.04625, 2023.
- J. Ho and T. Salimans, “Classifier-free diffusion guidance,” arXiv preprint arXiv:2207.12598, 2022.
- L. Struppek, D. Hintersdorf, A. D. A. Correia, A. Adler, and K. Kersting, “Plug & play attacks: Towards robust and flexible model inversion attacks,” arXiv preprint arXiv:2201.12179, 2022.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
- H.-W. Ng and S. Winkler, “A data-driven approach to cleaning large face datasets,” in 2014 IEEE international conference on image processing (ICIP). IEEE, 2014, pp. 343–347.
- Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3730–3738.
- R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
- M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
- J. Zheng, “Targeted image reconstruction by sampling pre-trained diffusion model,” arXiv preprint arXiv:2301.07557, 2023.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- D. Bau, J.-Y. Zhu, J. Wulff, W. Peebles, H. Strobelt, B. Zhou, and A. Torralba, “Seeing what a gan cannot generate,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4502–4511.
- S. An, G. Tao, Q. Xu, Y. Liu, G. Shen, Y. Yao, J. Xu, and X. Zhang, “Mirror: Model inversion for deep learning network with high fidelity,” in Proceedings of the 29th Network and Distributed System Security Symposium, 2022.