Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models (2306.13789v1)
Abstract: Natural language processing (NLP) models have become increasingly popular in real-world applications, such as text classification. However, they are vulnerable to privacy attacks, including data reconstruction attacks that aim to extract the data used to train the model. Most previous studies on data reconstruction attacks have focused on LLM, while classification models were assumed to be more secure. In this work, we propose a new targeted data reconstruction attack called the Mix And Match attack, which takes advantage of the fact that most classification models are based on LLM. The Mix And Match attack uses the base model of the target model to generate candidate tokens and then prunes them using the classification head. We extensively demonstrate the effectiveness of the attack using both random and organic canaries. This work highlights the importance of considering the privacy risks associated with data reconstruction attacks in classification models and offers insights into possible leakages.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., “Transformers: State-of-the-art natural language processing,” Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, pp. 38–45, 2020.
- A. Salem, A. Bhattacharya, M. Backes, M. Fritz, and Y. Zhang, “Updates-leak: Data set inference and reconstruction attacks in online learning,” in USENIX Security Symposium. USENIX, 2020.
- B. Balle, G. Cherubin, and J. Hayes, “Reconstructing training data with informed adversaries,” in IEEE Symposium on Security and Privacy (S&P), 2022.
- N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in 28th USENIX Security Symposium. USENIX Association, 2019, pp. 267–284.
- N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, Ú. Erlingsson, A. Oprea, and C. Raffel, “Extracting training data from large language models,” in 30th USENIX Security Symposium. USENIX Association, 2021, pp. 2633–2650.
- A. Elmahdy, H. A. Inan, and R. Sim, “Privacy leakage in text classification a data extraction approach,” in Proceedings of the Fourth Workshop on Privacy in Natural Language Processing. Association for Computational Linguistics, 2022, pp. 13–20.
- S. Yeom, I. Giacomelli, A. Menaged, M. Fredrikson, and S. Jha, “Overfitting, robustness, and malicious algorithms: A study of potential causes of privacy risk in machine learning,” in J. of Comput. Secur., vol. 28, no. 1. IOS Press, 2020, pp. 35–70.
- S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep learning based text classification: A comprehensive review,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1–40, 2021.
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” 2019.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), ser. NAACL-HLT ’19, 2019, pp. 4171–4186.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
- N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The secret sharer: Evaluating and testing unintended memorization in neural networks,” in 28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 267–284.
- S. Zanella-Béguelin, L. Wutschitz, S. Tople, V. Rühle, A. Paverd, O. Ohrimenko, B. Köpf, and M. Brockschmidt, “Analyzing information leakage of updates to natural language models,” in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, p. 363–375.
- N. Carlini, F. Tramèr, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, Ú. Erlingsson, A. Oprea, and C. Raffel, “Extracting training data from large language models,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2633–2650.
- H. A. Inan, O. Ramadan, L. Wutschitz, D. Jones, V. Rühle, J. Withers, and R. Sim, “Training data leakage analysis in language models,” arXiv preprint arXiv:2101.05405, 2021.
- F. Mireshghallah, H. Inan, M. Hasegawa, V. Rühle, T. Berg-Kirkpatrick, and R. Sim, “Privacy regularization: Joint privacy-utility optimization in languagemodels,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 3799–3807.
- N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, “Quantifying memorization across neural language models,” arXiv preprint arxiv.2202.07646, 2022.
- V. Feldman, “Does learning require memorization? A short tale about a long tail,” in Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, ser. STOC 2020, 2020, p. 954–959.
- G. Brown, M. Bun, V. Feldman, A. Smith, and K. Talwar, “When is memorization of irrelevant training data necessary for high-accuracy learning?” in Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, 2021, pp. 123–132.
- Art. 29 WP, “Opinion 05/2014 on “Anonymisation Techniques”,” 2014. [Online]. Available: https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
- E. Lehman, S. Jain, K. Pichotta, Y. Goldberg, and B. C. Wallace, “Does bert pretrained on clinical notes reveal sensitive data?” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 946–959.
- R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership inference attacks against machine learning models,” in IEEE Symposium on Security and Privacy (SP), 2017, pp. 3–18.
- S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in IEEE 31st Computer Security Foundations Symposium (CSF), 2018, pp. 268–282.
- Y. Long, V. Bindschaedler, L. Wang, D. Bu, X. Wang, H. Tang, C. A. Gunter, and K. Chen, “Understanding membership inferences on well-generalized learning models,” arXiv preprint arXiv:1802.04889, 2018.
- S. Truex, L. Liu, M. E. Gursoy, L. Yu, and W. Wei, “Towards demystifying membership inference attacks,” arXiv preprint arXiv:1807.09173, 2018.
- C. Song and V. Shmatikov, “Auditing data provenance in text-generation models,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 196–206.
- M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning,” in IEEE Symposium on Security and Privacy (SP), 2019, pp. 739–753.
- A. Sablayrolles, M. Douze, C. Schmid, Y. Ollivier, and H. Jégou, “White-box vs black-box: Bayes optimal strategies for membership inference,” in Proceedings of the 36th International Conference on Machine Learning, 2019, pp. 5558–5567.
- J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro, “LOGAN: Membership inference attacks against generative models,” in Proceedings on Privacy Enhancing Technologies (PoPETs), vol. 2019, 2019, pp. 133–152.
- A. Salem, Y. Zhang, M. Humbert, M. Fritz, and M. Backes, “ML-Leaks: Model and data independent membership inference attacks and defenses on machine learning models,” in Network and Distributed Systems Security Symposium, 2019.
- K. Leino and M. Fredrikson, “Stolen Memories: Leveraging model memorization for calibrated white-box membership inference,” in 29th USENIX Security Symposium (USENIX Security 20), 2020, pp. 1605–1622.
- C. A. Choquette-Choo, F. Tramer, N. Carlini, and N. Papernot, “Label-only membership inference attacks,” in International Conference on Machine Learning, 2021.
- V. Shejwalkar, H. A. Inan, A. Houmansadr, and R. Sim, “Membership inference attacks against NLP classification models,” in NeurIPS 2021 Workshop Privacy in Machine Learning, 2021.
- K. Ganju, Q. Wang, W. Yang, C. A. Gunter, and N. Borisov, “Property inference attacks on fully connected neural networks using permutation invariant representations,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’18. Association for Computing Machinery, 2018, p. 619–633.
- W. Zhang, S. Tople, and O. Ohrimenko, “Leakage of dataset properties in Multi-Party machine learning,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2687–2704.
- S. Mahloujifar, E. Ghosh, and M. Chase, “Property inference from poisoning,” in IEEE Symposium on Security and Privacy (SP), 2022, pp. 1569–1586.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171–4186.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations (ICLR), 2018.