Teach LLMs to Phish: Stealing Private Information from Language Models (2403.00871v1)

Published 1 Mar 2024 in cs.CR, cs.AI, cs.CL, and cs.LG

Abstract: When LLMs are trained on private data, it can be a significant privacy risk for them to memorize and regurgitate sensitive information. In this work, we propose a new practical data extraction attack that we call "neural phishing". This attack enables an adversary to target and extract sensitive or personally identifiable information (PII), e.g., credit card numbers, from a model trained on user data with upwards of 10% attack success rates, at times, as high as 50%. Our attack assumes only that an adversary can insert as few as 10s of benign-appearing sentences into the training dataset using only vague priors on the structure of the user data.

References (65)

Citations (12)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/PandaAshwinee/status/1804602624056119311

https://twitter.com/PandaAshwinee/status/1777728220218925299

YouTube

Show All Videos

Teach LLMs to Phish: Stealing Private Information from Language Models (2403.00871v1)

Summary

Related Papers

Tweets

YouTube