Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models (2404.01231v1)

Published 1 Apr 2024 in cs.CR and cs.LG
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

Abstract: It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-LLMs (CLIP) and LLMs, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.

Enhancing Membership Inference Attacks through Pre-Trained Model Poisoning

Introduction

The rise of large pre-trained models in machine learning has shifted attention towards efficiently fine-tuning them for specific tasks, given their broad applicational insights. However, the openness and accessibility of pre-trained models pose new vulnerabilities, specifically to backdoor attacks which aim to inject harmful behaviors into the models. This paper introduces the concept of a Privacy Backdoor Attack, a novel vulnerability exploitable through the poisoning of pre-trained models, leading to an amplified data leakage rate during the fine-tuning process.

Key Contributions

The paper presents a comprehensive analysis of privacy backdoor attacks, showcasing their feasibility across a variety of datasets and models, including both vision-LLMs like CLIP and LLMs. Through extensive experiments and ablation studies, the research demonstrates how such attacks can significantly increase the success rate of membership inference attacks in a stealthy manner. The critical contributions and findings can be summarized as follows:

  • Privacy Backdoor Attack Concept: The proposed black-box attack method introduces an insidious way to enhance privacy leakage by injecting a backdoor into a pre-trained model, which, when fine-tuned with private data, substantially leaks details about the data.
  • Experimental Validation: Experiments across diverse datasets and model architectures confirm the broad applicability and effectiveness of the proposed attack.
  • Ablation Studies: Detailed analyses underline the nuanced dynamics of the attack's efficiency, relating to different fine-tuning methods and inference strategies.
  • Implications and Future Directions: Highlighting a critical privacy concern, the paper prompts a reevaluation of the safety protocols surrounding the use of open-source pre-trained models and suggests areas for future research in defending against such vulnerabilities.

Deeper Insights

Using a black-box approach, the paper delineates a scenario where an adversary uploads a poisoned pre-trained model. Unwitting victims who fine-tune this model on their private datasets inadvertently make their data susceptible to heightened privacy breaches. The novel aspect of the attack lies in manipulating the loss associated with specific target data points during the pre-training phase, making subsequent membership inference attacks highly effective.

Experimental Results

The paper reports stark improvements in the membership inference attack's success rates after employing the privacy backdoor, across various datasets and models. For instance, in vision models like CLIP, the True Positive Rate (TPR) at 1% False Positive Rate (FPR) sees a substantial increase, vividly demonstrating the potency of the attack methodology. Even in LLMs, with adjustments in the attack strategy suitable for text data, there's a marked amplification in privacy leakage, confirming the flexibility and scalability of the proposed method.

Ablation Studies

The analysis further ventures into exploring different fine-tuning methods and inference strategies to gauge their impact on the attack's efficacy. Innovative fine-tuning methodologies such as Linear Probing, LoRA, and Noisy Embeddings, and various inference strategies including model quantization and watermarking, are scrutinized. These studies shed light on the nuanced factors influencing the success rates of privacy backdoors, providing valuable insights into potential defense mechanisms.

Conclusion and Looking Ahead

The research unearths noticeable privacy vulnerabilities tied to the popular practice of utilizing pre-trained foundation models, specifically unveiling how adversaries can exploit these models to orchestrate potent privacy backdoor attacks. With the potential for widespread implications across numerous applications and industries reliant on machine learning models, this paper serves as a critical call to action for the community to devise and implement robust security measures safeguarding against such privacy breaches. Future explorations may revolve around novel defense mechanisms, enhanced transparency in model sharing platforms, and a reimagined framework for securely leveraging the prowess of pre-trained models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. ai4Privacy. pii-masking-200k (revision 1d4c0a1), 2023. URL https://huggingface.co/datasets/ai4privacy/pii-masking-200k.
  2. Quantifying membership inference vulnerability via generalization gap and other model metrics. arXiv preprint arXiv:2009.05669, 2020.
  3. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp.  2397–2430. PMLR, 2023.
  4. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, March 2021. URL https://doi.org/10.5281/zenodo.5297715.
  5. When the curious abandon honesty: Federated learning is not private. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P), pp.  175–199, 2023. doi: 10.1109/EuroSP57164.2023.00020.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. The secret sharer: Evaluating and testing unintended memorization in neural networks. In USENIX Security Symposium, 2018. URL https://api.semanticscholar.org/CorpusID:170076423.
  8. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp.  2633–2650. USENIX Association, August 2021. ISBN 978-1-939133-24-3. URL https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.
  9. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp.  1897–1914. IEEE, 2022.
  10. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pp.  5253–5270, Anaheim, CA, August 2023. USENIX Association. ISBN 978-1-939133-37-3. URL https://www.usenix.org/conference/usenixsecurity23/presentation/carlini.
  11. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
  12. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2818–2829, 2023.
  13. Label-only membership inference attacks. In International conference on machine learning, pp.  1964–1974. PMLR, 2021.
  14. Knowledge neurons in pretrained transformers. In Muresan, S., Nakov, P., and Villavicencio, A. (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8493–8502, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.581. URL https://aclanthology.org/2022.acl-long.581.
  15. Privacy side channels in machine learning systems. arXiv preprint arXiv:2309.05610, 2023.
  16. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  17. Qlora: Efficient finetuning of quantized llms. ArXiv, abs/2305.14314, 2023. URL https://api.semanticscholar.org/CorpusID:258841328.
  18. Are diffusion models vulnerable to membership inference attacks? In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  8717–8730. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/duan23b.html.
  19. Privacy backdoors: Stealing data with corrupted pretrained models. 2024.
  20. Robbing the fed: Directly obtaining private data in federated learning with modified models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=fwzUgo0FM9v.
  21. Decepticons: Corrupted transformers breach privacy in federated learning for language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=r0BrY4BiEXO.
  22. Frey, S. Introducing Android’s Private Compute Services, September 2021. URL https://security.googleblog.com/2021/09/introducing-androids-private-compute.html.
  23. Inverting gradients - how easy is it to break privacy in federated learning? In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  16937–16947. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/c4ede56bbd98819ae6112b20ac6bf145-Paper.pdf.
  24. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
  25. Handcrafted backdoors in deep neural networks. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  8068–8080. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/3538a22cd3ceb8f009cc62b9e535c29f-Paper-Conference.pdf.
  26. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  27. Sleeper agents: Training deceptive llms that persist through safety training. arXiv preprint arXiv:2401.05566, 2024.
  28. Neftune: Noisy embeddings improve instruction finetuning. arXiv preprint arXiv:2310.05914, 2023.
  29. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  30. Mimic-iv, a freely accessible electronic health record dataset. Scientific data, 10(1):1, 2023.
  31. A watermark for large language models. In International Conference on Machine Learning, 2023. URL https://api.semanticscholar.org/CorpusID:256194179.
  32. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
  33. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  34. How hard is trojan detection in DNNs? fooling detectors with evasive trojans, 2023. URL https://openreview.net/forum?id=V-RDBWYf0go.
  35. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  36. Regularizing and optimizing lstm language models. ArXiv, abs/1708.02182, 2017. URL https://api.semanticscholar.org/CorpusID:212756.
  37. Language model inversion. arXiv preprint arXiv:2311.13647, 2023.
  38. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  39. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  40. White-box vs black-box: Bayes optimal strategies for membership inference. In International Conference on Machine Learning, pp.  5558–5567. PMLR, 2019.
  41. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pp.  3–18. IEEE, 2017.
  42. The White House. Fact sheet: President biden issues executive order on safe, secure, and trustworthy artificial intelligence, 10 2023. URL https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/.
  43. Truth serum: Poisoning machine learning models to reveal their secrets. Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022. URL https://api.semanticscholar.org/CorpusID:247922814.
  44. Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial. Nature Medicine, 29(10):2633–2642, 2023.
  45. Fishing for user data in large-batch federated learning via gradient magnification. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  23668–23684. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/wen22a.html.
  46. Canary in a coalmine: Better membership inference with ensembled adversarial queries. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=b7SBTEBFnC.
  47. Robust fine-tuning of zero-shot models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7959–7971, 2022.
  48. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF), pp.  268–282. IEEE, 2018.
  49. See through gradients: Image batch recovery via gradinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  16337–16346, June 2021.
  50. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuxin Wen (33 papers)
  2. Leo Marchyok (1 paper)
  3. Sanghyun Hong (38 papers)
  4. Jonas Geiping (73 papers)
  5. Tom Goldstein (226 papers)
  6. Nicholas Carlini (101 papers)
Citations (9)
X Twitter Logo Streamline Icon: https://streamlinehq.com