Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems (2402.17840v3)
Abstract: Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG LLMs (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. We also study multiple effects of RAG setup on the extractability of data, indicating that following unexpected instructions to regurgitate data can be an outcome of failure in effectively utilizing contexts for modern LMs, and further show that such vulnerability can be greatly mitigated by position bias elimination strategies. Extending our study to production RAG models GPTs, we design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries, and we extract text data verbatim at a rate of 41% from a book of 77,000 words and 3% from a corpus of 1,569,000 words by prompting the GPTs with only 100 queries generated by themselves.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
- Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples. arXiv preprint arXiv:2209.02128.
- What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 2280–2292.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium (USENIX Security 19), pages 267–284.
- Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650.
- Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5253–5270.
- Gmail smart compose: Real-time assisted writing. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2287–2295.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
- Challenges towards the next frontier in privacy. arXiv preprint arXiv:2304.06929.
- What’s in my big data?
- Ronen Eldan and Mark Russinovich. 2023. Who’s harry potter? approximate unlearning in llms. arXiv preprint arXiv:2310.02238.
- Shahriar Golchin and Mihai Surdeanu. 2023. Time travel in llms: Tracing data contamination in large language models.
- Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90.
- Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
- Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset. Advances in Neural Information Processing Systems, 35:29217–29234.
- Privacy implications of retrieval-based language models. arXiv preprint arXiv:2305.14888.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Mixtral of experts. arXiv preprint arXiv:2401.04088.
- Health-llm: Personalized retrieval-augmented disease prediction model. arXiv preprint arXiv:2402.00746.
- Deduplicating training data mitigates privacy risks in language models. In International Conference on Machine Learning, pages 10697–10707. PMLR.
- Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
- Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
- Solar 10.7 b: Scaling large language models with simple yet effective depth up-scaling. arXiv preprint arXiv:2312.15166.
- LangChain. 2022. Langchain.
- Platypus: Quick, cheap, and powerful refinement of llms. arXiv preprint arXiv:2308.07317.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
- Prompt injection attack against llm-integrated applications. arXiv preprint arXiv:2306.05499.
- Analyzing leakage of personally identifiable information in language models.
- Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147.
- Silo language models: Isolating legal risk in a nonparametric datastore. arXiv preprint arXiv:2308.04430.
- Language model inversion.
- Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035.
- OpenAI. 2023. Introducing gpts.
- OpenAI. 2024. Memory and new controls for chatgpt.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- Generative agents: Interactive simulacra of human behavior.
- Fábio Perez and Ian Ribeiro. 2022. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527.
- Shawn Presser. 2020. Books3.
- In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
- Alex Reisner. 2024. Revealed: The authors whose pirated books are powering generative ai.
- How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426.
- The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
- "do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models.
- Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567.
- Detecting personal information in training corpora: an analysis. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 208–220.
- Understanding unintended memorization in language models under federated learning. In Proceedings of the Third Workshop on Privacy in Natural Language Processing, pages 1–10.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- VoyageAI. 2024. Voyageai.
- Jailbroken: How does llm safety training fail? In Thirty-seventh Conference on Neural Information Processing Systems.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
- Retrieval meets long context large language models. arXiv preprint arXiv:2310.03025.
- WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2013–2018, Lisbon, Portugal. Association for Computational Linguistics.
- Retrieval-augmented multimodal language modeling. arXiv preprint arXiv:2211.12561.
- Benchmarking and defending against indirect prompt injection attacks on large language models. arXiv preprint arXiv:2312.14197.
- Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500.
- Assessing prompt injection risks in 200+ custom gpts. arXiv preprint arXiv:2311.11538.
- Enhancing financial sentiment analysis via retrieval augmented large language models. In Proceedings of the Fourth ACM International Conference on AI in Finance, pages 349–356.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115.
- Counterfactual memorization in neural language models. arXiv preprint arXiv:2112.12938.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- Yiming Zhang and Daphne Ippolito. 2023. Prompts should not be seen as secrets: Systematically measuring prompt extraction attack success. arXiv preprint arXiv:2307.06865.
- Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144.
- Don’t forget private retrieval: distributed private similarity search for large language models. arXiv preprint arXiv:2311.12955.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.