Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProPILE: Probing Privacy Leakage in Large Language Models (2307.01881v1)

Published 4 Jul 2023 in cs.CR and cs.CL

Abstract: The rapid advancement and widespread use of LLMs have raised significant concerns regarding the potential leakage of personally identifiable information (PII). These models are often trained on vast quantities of web-collected data, which may inadvertently include sensitive personal data. This paper presents ProPILE, a novel probing tool designed to empower data subjects, or the owners of the PII, with awareness of potential PII leakage in LLM-based services. ProPILE lets data subjects formulate prompts based on their own PII to evaluate the level of privacy intrusion in LLMs. We demonstrate its application on the OPT-1.3B model trained on the publicly available Pile dataset. We show how hypothetical data subjects may assess the likelihood of their PII being included in the Pile dataset being revealed. ProPILE can also be leveraged by LLM service providers to effectively evaluate their own levels of PII leakage with more powerful prompts specifically tuned for their in-house models. This tool represents a pioneering step towards empowering the data subjects for their awareness and control over their own data on the web.

ProPILE: Probing Privacy Leakage in LLMs

The paper "ProPILE: Probing Privacy Leakage in LLMs" addresses the critical issue of privacy leakage associated with LLMs, which have become pivotal in the fields of artificial intelligence and machine learning. The paper introduces ProPILE, a probing tool designed to evaluate and address the leaks of personally identifiable information (PII) from LLMs that are built on extensive web-crawled datasets.

Introduction to the Problem

The development and deployment of LLMs have surged in recent years, utilizing vast amounts of data sourced from the internet. This raises substantial privacy concerns because the training datasets may inadvertently contain sensitive information from various sources such as personal webpages, social media, online forums, and other repositories. Unlike previous web-based platforms where users knowingly shared data, the expansive reach of LLMs means potential privacy vulnerabilities for a broader spectrum of individuals whose data might appear in publicly accessible domains.

Methodology

ProPILE empowers stakeholders, particularly data subjects and LLM service providers, to assess privacy risks associated with LLM systems. It allows users to craft prompts based on their own PII to probe LLMs like OPT-1.3B, testing how likely these models are to reveal such information. The methodology encompasses two primary probing techniques:

  1. Black-box Probing: This is available to data subjects, who typically have black-box access to LLM services, meaning they interact through user interfaces or APIs without knowledge of the internal workings. By using their own PII to create query prompts, users can gauge how often LLMs inadvertently reconstruct PII.
  2. White-box Probing: Ideal for service providers who have comprehensive access to model internals, including training data and model parameters. This allows for a deeper analysis using tools like soft prompt tuning to enhance probing accuracy.

Findings

The empirical results from testing ProPILE on the OPT-1.3B model demonstrate two main outcomes:

  • A significant portion of structured and unstructured PII from model training data could be exposed with specially crafted prompts.
  • Advanced prompt techniques, particularly within the white-box scenario, show higher degrees of PII leakage.

The paper illustrates that phone numbers, email addresses, physical addresses, family relationships, and university affiliations can be reconstructed or matched with varying degrees of likelihood, thus posing privacy risks. Metrics such as exact match and likelihood rates further characterize these risks, providing insight into potential data vulnerabilities.

Implications and Future Directions

The implications of this paper are substantial for the development and deployment of LLMs:

  • Practical Impact: None of the detected privacy vulnerabilities are negligible; even low likelihoods translate into privacy risks when LLMs operate at the scale of hundreds of millions of users globally. This has immediate ramifications for compliance with privacy standards and regulations, such as GDPR and other similar frameworks.
  • Theoretical Impact and Future Research: The introduction of ProPILE encourages further research into mitigating privacy risks. It suggests re-evaluating the trade-offs between data utility and privacy. Future studies could explore adaptive privacy-preserving methods in model training and inference stages, potentially decentralizing data processing or enhancing anonymization techniques.

In summary, ProPILE represents a pivotal tool in the ongoing discourse on privacy protection in AI technologies, offering a proactive approach for both end-users and developers to critically assess and mitigate privacy risks associated with LLMs. As the scale and scope of AI models continue to evolve, such probing tools will be essential in ensuring that the technological advancements are achieved responsibly and ethically.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Amazon Web Services. Detecting and redacting pii using amazon comprehend. https://aws.amazon.com/ko/blogs/machine-learning/detecting-and-redacting-pii-using-amazon-comprehend/, 2023.
  2. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.", 2009.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  4. Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646, 2022.
  5. Extracting training data from large language models, 2021.
  6. Gan-leaks: A taxonomy of membership inference attacks against generative models. In Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, pages 343–362, 2020.
  7. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  8. William Jay Conover. Practical nonparametric statistics, volume 350. john wiley & sons, 1999.
  9. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1322–1333, 2015.
  10. The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  11. Model inversion attacks against collaborative inference. In Proceedings of the 35th Annual Computer Security Applications Conference, pages 148–162, 2019.
  12. Membership inference attacks on sequence-to-sequence models: Is my data in your machine translation system? Transactions of the Association for Computational Linguistics, 8:49–63, 2020.
  13. Are large pre-trained language models leaking your personal information? arXiv preprint arXiv:2205.12628, 2022.
  14. Preventing verbatim memorization in language models gives a false sense of privacy. arXiv preprint arXiv:2210.17546, 2022.
  15. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, 2021.
  16. Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707–710. Soviet Union, 1966.
  17. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
  18. LiveMint. Chatgpt answer goes wrong, gives away journalist’s number to join signal. https://www.livemint.com/news/chatgpt-answer-goes-wrong-gives-away-journalist-s-number-to-join-signal -11676625029542.html, 2023.
  19. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  20. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
  21. Analyzing leakage of personally identifiable information in language models. arXiv preprint arXiv:2302.00539, 2023.
  22. Adversarial prompting for black box foundation models. arXiv preprint arXiv:2302.04237, 2023.
  23. Microsoft Presidio: Context aware, pluggable and customizable pii anonymization service for text and images, 2018.
  24. OpenAI. Gpt-4 technical report, 2023.
  25. Martine Paris. Chatgpt hits 100 million users, google invests in ai bot and catgpt goes viral, Apr 2023.
  26. A terminology for talking about privacy by data minimization: Anonymity, unlinkability, undetectability, unobservability, pseudonymity, and identity management, 2010.
  27. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  28. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7, 2021.
  29. A survey of privacy attacks in machine learning. arXiv preprint arXiv:2007.07646, 2020.
  30. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
  31. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3–18. IEEE, 2017.
  32. Auditing data provenance in text-generation models. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 196–206, 2019.
  33. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  34. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  35. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020. Association for Computational Linguistics.
  36. Neural network inversion in adversarial setting via background knowledge alignment. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pages 225–240, 2019.
  37. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  38. The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 253–261, 2020.
  39. Deep leakage from gradients. Advances in neural information processing systems, 32, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Siwon Kim (16 papers)
  2. Sangdoo Yun (71 papers)
  3. Hwaran Lee (31 papers)
  4. Martin Gubri (12 papers)
  5. Sungroh Yoon (163 papers)
  6. Seong Joon Oh (60 papers)
Citations (72)
Youtube Logo Streamline Icon: https://streamlinehq.com