Similarity-based Label Inference Attack against Training and Inference of Split Learning (2203.05222v2)
Abstract: Split learning is a promising paradigm for privacy-preserving distributed learning. The learning model can be cut into multiple portions to be collaboratively trained at the participants by exchanging only the intermediate results at the cut layer. Understanding the security performance of split learning is critical for many privacy-sensitive applications. This paper shows that the exchanged intermediate results, including the smashed data (i.e., extracted features from the raw data) and gradients during training and inference of split learning, can already reveal the private labels. We mathematically analyze the potential label leakages and propose the cosine and Euclidean similarity measurements for gradients and smashed data, respectively. Then, the two similarity measurements are shown to be unified in Euclidean space. Based on the similarity metric, we design three label inference attacks to efficiently recover the private labels during both the training and inference phases. Experimental results validate that the proposed approaches can achieve close to 100% accuracy of label attacks. The proposed attack can still achieve accurate predictions against various state-of-the-art defense mechanisms, including DP-SGD, label differential privacy, gradient compression, and Marvell.
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal processing magazine, vol. 29, no. 6, pp. 82–97, 2012.
- M. Alam, M. D. Samad, L. Vidyaratne, A. Glandon, and K. M. Iftekharuddin, “Survey on Deep Neural Networks in Speech and Vision Systems,” Neurocomputing, vol. 417, pp. 302–321, 2020.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
- S. Nie, M. Zheng, and Q. Ji, “The Deep Regression Bayesian Network and Its Applications: Probabilistic Deep Learning for Computer Vision,” IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 101–111, 2018.
- J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep Learning for Smart Manufacturing: Methods and Applications,” Journal of manufacturing systems, vol. 48, pp. 144–156, 2018.
- A. Essien and C. Giannetti, “A Deep Learning Model for Smart Manufacturing using Convolutional LSTM Neural Network Autoencoders,” IEEE Transactions on Industrial Informatics, vol. 16, no. 9, pp. 6069–6078, 2020.
- Statista, “Market Size and Revenue Comparison for Artificial Intelligence Worldwide from 2018 to 2027,” 2022, https://www.statista.com/statistics/941835/artificial-intelligence-market-size-revenue-comparisons/.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
- P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split Learning for Health: Distributed Deep Learning without Sharing Raw Patient Data,” arXiv preprint arXiv:1812.00564, 2018.
- O. Gupta and R. Raskar, “Distributed Learning of Deep Neural Network Over Multiple Agents,” Journal of Network and Computer Applications, vol. 116, pp. 1–8, 2018.
- M. G. Poirot, P. Vepakomma, K. Chang, J. Kalpathy-Cramer, R. Gupta, and R. Raskar, “Split Learning for Collaborative Deep Learning in Healthcare,” arXiv preprint arXiv:1912.12115, 2019.
- Y. Koda, J. Park, M. Bennis, K. Yamamoto, T. Nishio, and M. Morikura, “One Pixel Image and RF Signal Based Split Learning for mmWave Received Power Prediction,” in Proceedings of the 15th International Conference on emerging Networking EXperiments and Technologies, 2019, pp. 54–56.
- A. Singh, P. Vepakomma, O. Gupta, and R. Raskar, “Detailed Comparison of Communication Efficiency of Split Learning and Federated Learning,” arXiv preprint arXiv:1909.09145, 2019.
- S. Abuadbba, K. Kim, M. Kim, C. Thapa, S. A. Camtepe, Y. Gao, H. Kim, and S. Nepal, “Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training?” in Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, 2020, pp. 305–318.
- Y. Gao, M. Kim, S. Abuadbba, Y. Kim, C. Thapa, K. Kim, S. A. Camtepe, H. Kim, and S. Nepal, “End-to-End Evaluation of Federated Learning and Split Learning for Internet of Things,” arXiv preprint arXiv:2003.13376, 2020.
- J. Kim, S. Shin, Y. Yu, J. Lee, and K. Lee, “Multiple Classification with Split Learning,” arXiv preprint arXiv:2008.09874, 2020.
- “Advancements of Federated Learning Towards Privacy Preservation: From Federated Learning to Split Learning, author=Thapa, Chandra and Chamikara, Mahawaga Arachchige Pathum and Camtepe, Seyit A, booktitle=Federated Learning Systems, pages=79–109, year=2021, publisher=Springer.”
- A. Abedi and S. S. Khan, “FedSL: Federated Split Learning on Distributed Sequential Data in Recurrent Neural Networks,” arXiv preprint arXiv:2011.03180, 2020.
- V. Turina, Z. Zhang, F. Esposito, and I. Matta, “Combining Split and Federated Architectures for Efficiency and Privacy in Deep Learning,” in Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies, 2020, pp. 562–563.
- I. Ceballos, V. Sharma, E. Mugica, A. Singh, A. Roman, P. Vepakomma, and R. Raskar, “Splitnn-Driven Vertical Partitioning,” arXiv preprint arXiv:2008.04137, 2020.
- J. Jeon and J. Kim, “Privacy-Sensitive Parallel Split Learning,” in 2020 International Conference on Information Networking (ICOIN). IEEE, 2020, pp. 7–9.
- C. Thapa, M. A. P. Chamikara, S. Camtepe, and L. Sun, “SplitFed: When Federated Learning Meets Split Learning,” arXiv preprint arXiv:2004.12088, 2020.
- K. Palanisamy, V. Khimani, M. H. Moti, and D. Chatzopoulos, “SplitEasy: A Practical Approach for Training ML models on Mobile Devices,” in Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, 2021, pp. 37–43.
- D. Romanini, A. J. Hall, P. Papadopoulos, T. Titcombe, A. Ismail, T. Cebere, R. Sandmann, R. Roehm, and M. A. Hoeh, “PyVertical: A Vertical Federated Learning Framework for Multi-headed SplitNN,” arXiv preprint arXiv:2104.00489, 2021.
- Y. J. Ha, M. Yoo, G. Lee, S. Jung, S. W. Choi, J. Kim, and S. Yoo, “Spatio-Temporal Split Learning for Privacy-Preserving Medical Platforms: Case Studies with COVID-19 CT, X-Ray, and Cholesterol Data,” IEEE Access, vol. 9, pp. 121 046–121 059, 2021.
- Y. J. Ha, M. Yoo, S. Park, S. Jung, and J. Kim, “Secure Aerial Surveillance using Split Learning,” in 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN). IEEE, 2021, pp. 434–437.
- Y. Koda, J. Park, M. Bennis, K. Yamamoto, T. Nishio, M. Morikura, and K. Nakashima, “Communication-Efficient Multimodal Split Learning for mmWave Received Power Prediction,” IEEE Communications Letters, vol. 24, no. 6, pp. 1284–1288, 2020.
- V. Kolesnikov, R. Kumaresan, M. Rosulek, and N. Trieu, “Efficient Batched Oblivious PRF with Applications to Private Set Intersection,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 818–829.
- B. Pinkas, T. Schneider, and M. Zohner, “Scalable Private Set Intersection Based on OT Extension,” ACM Transactions on Privacy and Security (TOPS), vol. 21, no. 2, pp. 1–35, 2018.
- K. W. Sen Yuan, Min Xue and M. Shen, “Deep Gradient Attack with Strong DP-SGD Lower Bound for Label Privacy,” in ICLR 2021 Workshop on Security and Safety in Machine Learning Systems, 2021.
- O. Li, J. Sun, X. Yang, W. Gao, Z. H., J. Xie, V. Smith, and C. Wang, “Label Leakage and Protection in Two-party Split Learning,” in International Conference on Learning Representations, 2022.
- E. Erdoğan, A. Küpçü, and A. E. Çiçek, “Unsplit: Data-oblivious model inversion, model stealing, and label inference attacks against split learning,” in Proceedings of the 21st Workshop on Privacy in the Electronic Society, ser. WPES’22. New York, NY, USA: Association for Computing Machinery, 2022, p. 115–124. [Online]. Available: https://doi.org/10.1145/3559613.3563201
- S. Kariyappa and M. K. Qureshi, “Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning,” arXiv preprint arXiv:2112.01299, 2021.
- J. Liu and X. Lyu, “Distance-based online label inference attacks against split learning,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
- H. W. Kuhn, “The Hungarian Method for the Assignment Problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
- J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, T. Verbelen, and J. S. Rellermeyer, “A survey on distributed machine learning,” Acm computing surveys (csur), vol. 53, no. 2, pp. 1–33, 2020.
- H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas, “Federated learning of deep networks using model averaging,” arXiv preprint arXiv:1602.05629, vol. 2, p. 2, 2016.
- C. Fu, X. Zhang, S. Ji, J. Chen, J. Wu, S. Guo, J. Zhou, A. X. Liu, and T. Wang, “Label inference attacks against vertical federated learning,” in 31st USENIX Security Symposium (USENIX Security 22). Boston, MA: USENIX Association, Aug. 2022, pp. 1397–1414. [Online]. Available: https://www.usenix.org/conference/usenixsecurity22/presentation/fu-chong
- S. Xie, X. Yang, Y. Yao, T. Liu, T. Wang, and J. Sun, “Label Inference Attack against Split Learning under Regression Setting,” arXiv preprint arXiv:2301.07284, 2023.
- B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 603–618.
- Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi, “Beyond Inferring Class Representatives: User-level Privacy Leakage from Federated Learning,” in IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019, pp. 2512–2520.
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership Inference Attacks Against Machine Learning Models,” in 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017, pp. 3–18.
- S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei, “Demystifying Membership Inference Attacks in Machine Learning as a Service,” IEEE Transactions on Services Computing, 2019.
- M. Shen, H. Wang, B. Zhang, L. Zhu, K. Xu, Q. Li, and X. Du, “Exploiting Unintended Property Leakage in Blockchain-Assisted Federated Learning for Intelligent Edge Computing,” IEEE Internet of Things Journal, vol. 8, no. 4, pp. 2265–2275, 2021.
- L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting Unintended Feature Leakage in Collaborative Learning,” in 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 2019, pp. 691–706.
- X. Lyu, J. Liu, C. Ren, and G. Nan, “Security-Communication-Computation Tradeoff of Split Decisions for Edge Intelligence,” IEEE Wireless Communications, pp. 1–7, 2023.
- L. Zhu, Z. Liu, and S. Han, “Deep Leakage from Gradients,” arXiv preprint arXiv:1906.08935, 2019.
- B. Zhao, K. R. Mopuri, and H. Bilen, “iDLG: Improved Deep Leakage from Gradients,” arXiv preprint arXiv:2001.02610, 2020.
- A. Wainakh, F. Ventola, T. Müßig, J. Keim, C. G. Cordero, E. Zimmer, T. Grube, K. Kersting, and M. Mühlhäuser, “User Label Leakage from Gradients in Federated Learning,” arXiv preprint arXiv:2105.09369, 2021.
- T. Dang, O. Thakkar, S. Ramaswamy, R. Mathews, P. Chin, and F. Beaufays, “Revealing and Protecting Labels in Distributed Training,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34. Curran Associates, Inc., 2021, pp. 1727–1738. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2021/file/0d924f0e6b3fd0d91074c22727a53966-Paper.pdf
- D. Pasquini, G. Ateniese, and M. Bernaschi, “Unleashing the Tiger: Inference Attacks on Split Learning,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 2113–2129.
- M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds. Cham: Springer International Publishing, 2014, pp. 818–833.
- A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” Advances in neural information processing systems, vol. 32, pp. 8026–8037, 2019.
- H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv preprint arXiv:1708.07747, 2017.
- Kaggle, “Dogs vs. Cats,” 2013, https://www.kaggle.com/c/dogs-vs-cats.
- ——, “Intel Image Classification,” 2018, https://www.kaggle.com/datasets/puneet6060/intel-image-classification.
- ——, “Fruits 360,” 2020, https://www.kaggle.com/datasets/moltean/fruits.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
- S. Hasanpour, M. Rouhani, M. Fayyaz, and M. Sabokrou, “Lets Keep It Simple, Using Simple Architectures to Outperform Deeper and More Complex Architectures,” arXiv preprint arXiv:1608.06037, 2016.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” p. 308–318, 2016. [Online]. Available: https://doi.org/10.1145/2976749.2978318
- B. Ghazi, N. Golowich, R. Kumar, P. Manurangsi, and C. Zhang, “Deep Learning with Label Differential Privacy,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 27 131–27 145.
- Y. Tsuzuku, H. Imachi, and T. Akiba, “Variance-based Gradient Compression for Efficient Distributed Deep Learning,” arXiv preprint arXiv:1802.06058, 2018.
- Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training,” arXiv preprint arXiv:1712.01887, 2020.
- C. Dwork and A. Roth, “The Algorithmic Foundations of Differential Privacy,” Foundations and Trends in Theoretical Computer Science, vol. 9, pp. 211–407, 8 2014.
- A. Yousefpour, I. Shilov, A. Sablayrolles, D. Testuggine, K. Prasad, M. Malek, J. Nguyen, S. Ghosh, A. Bharadwaj, J. Zhao, G. Cormode, and I. Mironov, “Opacus: User-Friendly Differential Privacy Library in PyTorch,” arXiv preprint arXiv:2109.12298, 2021.