Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Teach LLMs to Phish: Stealing Private Information from Language Models (2403.00871v1)

Published 1 Mar 2024 in cs.CR, cs.AI, cs.CL, and cs.LG

Abstract: When LLMs are trained on private data, it can be a significant privacy risk for them to memorize and regurgitate sensitive information. In this work, we propose a new practical data extraction attack that we call "neural phishing". This attack enables an adversary to target and extract sensitive or personally identifiable information (PII), e.g., credit card numbers, from a model trained on user data with upwards of 10% attack success rates, at times, as high as 50%. Our attack assumes only that an adversary can insert as few as 10s of benign-appearing sentences into the training dataset using only vague priors on the structure of the user data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pp.  308–318, 2016.
  2. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
  3. Anyscale, 2023. URL https://twitter.com/robertnishihara/status/1707251672328851655.
  4. How to backdoor federated learning. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp.  2938–2948. PMLR, 26–28 Aug 2020. URL https://proceedings.mlr.press/v108/bagdasaryan20a.html.
  5. Analyzing federated learning through an adversarial lens. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  634–643. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/bhagoji19a.html.
  6. Emergent and predictable memorization in large language models, 2023a.
  7. Pythia: A suite for analyzing large language models across training and scaling, 2023b.
  8. Poisoning attacks against support vector machines, 2013.
  9. Bloomberg. Using chatgpt at work, Mar 2023. URL https://www.bloomberg.com/news/articles/2023-03-20/using-chatgpt-at-work-nearly-half-of-firms-are-drafting-policies-on-its-use.
  10. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  11. The secret sharer: Evaluating and testing unintended memorization in neural networks, 2019.
  12. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pp. 2633–2650. USENIX Association, August 2021. ISBN 978-1-939133-24-3. URL https://www.usenix.org/conference/usenixsecurity21/presentation/carlini-extracting.
  13. Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914. IEEE, 2022.
  14. Extracting training data from diffusion models, 2023a.
  15. Quantifying memorization across neural language models. In The Eleventh International Conference on Learning Representations, 2023b. URL https://openreview.net/forum?id=TatRHT_1cK.
  16. Poisoning web-scale training datasets is practical. arXiv preprint arXiv:2302.10149, 2023c.
  17. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pp.  47–60, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450345286. doi: 10.1145/3055399.3055491. URL https://doi.org/10.1145/3055399.3055491.
  18. Communication efficient federated learning with secure aggregation and differential privacy. In NeurIPS 2021 Workshop Privacy in Machine Learning, 2021.
  19. The fundamental price of secure aggregation in differentially private federated learning. In International Conference on Machine Learning, pp. 3056–3089. PMLR, 2022.
  20. Capc learning: Confidential and private collaborative learning. arXiv preprint arXiv:2102.05188, 2021a.
  21. Label-only membership inference attacks. In International conference on machine learning, pp. 1964–1974. PMLR, 2021b.
  22. (amplified) banded matrix factorization: A unified approach to private training. arXiv preprint arXiv:2306.08153, 2023a. URL https://arxiv.org/abs/2306.08153.
  23. Privacy amplification for matrix mechanisms. arXiv preprint arXiv:2310.15526, 2023b.
  24. Multi-epoch matrix factorization mechanisms for private machine learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  5924–5963. PMLR, 23–29 Jul 2023c. URL https://proceedings.mlr.press/v202/choquette-choo23a.html.
  25. Improved differential privacy for sgd via optimal private linear operators on adaptive streams. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  5910–5924. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/271ec4d1a9ff5e6b81a6e21d38b1ba96-Paper-Conference.pdf.
  26. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  27. Adversarial examples make strong poisons. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  30339–30351. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/fe87435d12ef7642af67d9bc82a8b3cd-Paper.pdf.
  28. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, pp.  1322–1333, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450338325. doi: 10.1145/2810103.2813677. URL https://doi.org/10.1145/2810103.2813677.
  29. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  30. Witches’ brew: Industrial scale data poisoning via gradient matching. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=01olnfLIbD.
  31. Are large pre-trained language models leaking your personal information?, 2022.
  32. Preventing verbatim memorization in language models gives a false sense of privacy. arXiv preprint arXiv:2210.17546, 2022.
  33. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 19–35, 2018. doi: 10.1109/SP.2018.00057.
  34. Auditing differentially private machine learning: How private is private sgd? In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  22205–22216. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/fc4ddc15f9f4b4b06ef7844d6bb53abf-Paper.pdf.
  35. Measuring forgetting of memorized training examples. arXiv preprint arXiv:2207.00099, 2022.
  36. Students parrot their teachers: Membership inference on model distillation. arXiv preprint arXiv:2303.03446, 2023.
  37. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In International Conference on Machine Learning, pp. 5201–5212. PMLR, 2021a.
  38. Practical and private (deep) learning without sampling or shuffling. In ICML, 2021b.
  39. Advances and open problems in federated learning, 2021c.
  40. Madlad-400: A multilingual and document-level large audited dataset. arXiv preprint arXiv:2309.04662, 2023.
  41. Deduplicating training data makes language models better. arXiv preprint arXiv:2107.06499, 2021.
  42. Trojaning attack on neural networks. In Network and Distributed System Security Symposium, 2018.
  43. Analyzing leakage of personally identifiable information in language models, 2023.
  44. Shiona McCallum. Chatgpt banned in italy over privacy concerns, Apr 2023. URL https://www.bbc.com/news/technology-65139406.
  45. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  46. Pointer sentinel mixture models, 2016.
  47. An empirical analysis of memorization in fine-tuned autoregressive language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  1816–1826, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.119.
  48. Towards poisoning of deep learning algorithms with back-gradient optimization, 2017.
  49. OpenAI, 2023a. URL https://platform.openai.com/docs/guides/fine-tuning.
  50. OpenAI. Gpt-4 technical report, 2023b.
  51. Sparsefed: Mitigating model poisoning attacks in federated learning with sparsification. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera (eds.), Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp.  7587–7624. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/v151/panda22a.html.
  52. Politico. Chatgpt is entering a world of regulatory pain in the eu, Apr 2023. URL https://www.politico.eu/article/chatgpt-world-regulatory-pain-eu-privacy-data-protection-gdpr/.
  53. Google Research. deduplicate-text-datasets. https://github.com/google-research/deduplicate-text-datasets, 2023. Accessed: 2023-11-19.
  54. Poison frogs! targeted clean-label poisoning attacks on neural networks, 2018.
  55. Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18, 2017. doi: 10.1109/SP.2017.41.
  56. Certified defenses for data poisoning attacks. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/9d7311ba459f9e45ed746755a32dcd11-Paper.pdf.
  57. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  58. Memorization without overfitting: Analyzing the training dynamics of large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=u3vEuRr08MT.
  59. Llama 2: Open foundation and fine-tuned chat models, 2023.
  60. Truth serum: Poisoning machine learning models to reveal their secrets, 2022.
  61. Label-consistent backdoor attacks, 2019.
  62. Federated learning of gboard language models with differential privacy. arXiv preprint arXiv:2305.18465, 2023.
  63. Privacy risk in machine learning: Analyzing the connection to overfitting, 2018.
  64. Neurotoxin: Durable backdoors in federated learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  26429–26446. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/zhang22w.html.
  65. Judging llm-as-a-judge with mt-bench and chatbot arena, 2023.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com