Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 28 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 471 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems (2402.17840v3)

Published 27 Feb 2024 in cs.CL, cs.AI, cs.CR, and cs.LG

Abstract: Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG LLMs (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. We also study multiple effects of RAG setup on the extractability of data, indicating that following unexpected instructions to regurgitate data can be an outcome of failure in effectively utilizing contexts for modern LMs, and further show that such vulnerability can be greatly mitigated by position bias elimination strategies. Extending our study to production RAG models GPTs, we design an attack that can cause datastore leakage with a 100% success rate on 25 randomly selected customized GPTs with at most 2 queries, and we extract text data verbatim at a rate of 41% from a book of 77,000 words and 3% from a corpus of 1,569,000 words by prompting the GPTs with only 100 queries generated by themselves.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Qwen technical report. arXiv preprint arXiv:2309.16609.
  2. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  3. Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples. arXiv preprint arXiv:2209.02128.
  4. What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 2280–2292.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium (USENIX Security 19), pages 267–284.
  7. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633–2650.
  8. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5253–5270.
  9. Gmail smart compose: Real-time assisted writing. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2287–2295.
  10. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  11. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113.
  12. Challenges towards the next frontier in privacy. arXiv preprint arXiv:2304.06929.
  13. What’s in my big data?
  14. Ronen Eldan and Mark Russinovich. 2023. Who’s harry potter? approximate unlearning in llms. arXiv preprint arXiv:2310.02238.
  15. Shahriar Golchin and Mihai Surdeanu. 2023. Time travel in llms: Tracing data contamination in large language models.
  16. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pages 79–90.
  17. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  18. Pile of law: Learning responsible data filtering from the law and a 256gb open-source legal dataset. Advances in Neural Information Processing Systems, 35:29217–29234.
  19. Privacy implications of retrieval-based language models. arXiv preprint arXiv:2305.14888.
  20. Mistral 7b. arXiv preprint arXiv:2310.06825.
  21. Mixtral of experts. arXiv preprint arXiv:2401.04088.
  22. Health-llm: Personalized retrieval-augmented disease prediction model. arXiv preprint arXiv:2402.00746.
  23. Deduplicating training data mitigates privacy risks in language models. In International Conference on Machine Learning, pages 10697–10707. PMLR.
  24. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  25. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
  26. Solar 10.7 b: Scaling large language models with simple yet effective depth up-scaling. arXiv preprint arXiv:2312.15166.
  27. LangChain. 2022. Langchain.
  28. Platypus: Quick, cheap, and powerful refinement of llms. arXiv preprint arXiv:2308.07317.
  29. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  30. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  31. Prompt injection attack against llm-integrated applications. arXiv preprint arXiv:2306.05499.
  32. Analyzing leakage of personally identifiable information in language models.
  33. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147.
  34. Silo language models: Isolating legal risk in a nonparametric datastore. arXiv preprint arXiv:2308.04430.
  35. Language model inversion.
  36. Scalable extraction of training data from (production) language models. arXiv preprint arXiv:2311.17035.
  37. OpenAI. 2023. Introducing gpts.
  38. OpenAI. 2024. Memory and new controls for chatgpt.
  39. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  40. Generative agents: Interactive simulacra of human behavior.
  41. Fábio Perez and Ian Ribeiro. 2022. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527.
  42. Shawn Presser. 2020. Books3.
  43. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083.
  44. Alex Reisner. 2024. Revealed: The authors whose pirated books are powering generative ai.
  45. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426.
  46. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  47. "do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models.
  48. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567.
  49. Detecting personal information in training corpora: an analysis. In Proceedings of the 3rd Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), pages 208–220.
  50. Understanding unintended memorization in language models under federated learning. In Proceedings of the Third Workshop on Privacy in Natural Language Processing, pages 1–10.
  51. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  52. VoyageAI. 2024. Voyageai.
  53. Jailbroken: How does llm safety training fail? In Thirty-seventh Conference on Neural Information Processing Systems.
  54. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
  55. Retrieval meets long context large language models. arXiv preprint arXiv:2310.03025.
  56. WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2013–2018, Lisbon, Portugal. Association for Computational Linguistics.
  57. Retrieval-augmented multimodal language modeling. arXiv preprint arXiv:2211.12561.
  58. Benchmarking and defending against indirect prompt injection attacks on large language models. arXiv preprint arXiv:2312.14197.
  59. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500.
  60. Assessing prompt injection risks in 200+ custom gpts. arXiv preprint arXiv:2311.11538.
  61. Enhancing financial sentiment analysis via retrieval augmented large language models. In Proceedings of the Fourth ACM International Conference on AI in Finance, pages 349–356.
  62. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115.
  63. Counterfactual memorization in neural language models. arXiv preprint arXiv:2112.12938.
  64. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  65. Yiming Zhang and Daphne Ippolito. 2023. Prompts should not be seen as secrets: Systematically measuring prompt extraction attack success. arXiv preprint arXiv:2307.06865.
  66. Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144.
  67. Don’t forget private retrieval: distributed private similarity search for large language models. arXiv preprint arXiv:2311.12955.
Citations (9)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper demonstrates that adversarial prompt injections enable the extraction of secured data from instruction-tuned RAG systems.
  • The experiments reveal that larger models exhibit increased susceptibility to data leakage, correlating model capacity with extraction risks.
  • The findings underscore the urgent need for robust defense mechanisms and privacy-preserving techniques to protect sensitive datastore content.

Vulnerability of Retrieval-Augmented Generation Systems to Data Extraction

The paper "Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems" presents an exploration into potential vulnerabilities of instruction-tuned Retrieval-Augmented Generation (RAG) systems to data extraction attacks. It offers a comprehensive analysis across a spectrum of LMs, such as Llama2, Mistral, and Vicuna, highlighting scalable risks related to sensitive data leaks. This research is of particular interest to the NLP community, where the potential for misuse of LMs is an increasing concern.

Overview of RAG Systems and Potential Vulnerabilities

RAG systems operate by integrating external datasets into LMs at the inference stage, augmenting the capabilities of the models with up-to-date or domain-specific information. This augmentation aims to address several known limitations of pre-trained LMs, such as hallucinations, context length limitations, and knowledge staleness. However, the mechanism that integrates external knowledge also creates potential avenues for datastore leakage. This paper hypothesizes that adversaries can exploit the LMs' propensity to follow instructions to reconstruct the data stored in these external databases.

The researchers propose a threat model assuming black-box access to the RAG systems. By leveraging advanced prompt injection techniques, attackers could retrieve and reproduce verbatim data meant to be secured within the datastores of RAG models. This vulnerability is accentuated as the size of the model increases, revealing a direct correlation between model capacity and the likelihood of data extraction.

Experimental Analysis

The authors conducted a series of experiments using both open-source models and production-level GPT-based RAG systems to evaluate the extent of this vulnerability. Key findings from these experiments include:

  • Instruction-Tuned LMs: By crafting specific adversarial queries, the research demonstrated able verbatim extraction of data using models like Llama2-Chat, Mistral-Instruct, SOLAR, and others. The extent of the vulnerability was significant, with larger model sizes (up to 70 billion parameters) showing a marked increase in extracted data.
  • Impact of Model Scaling: Across all evaluated models, there was a consistent increase in data extraction capabilities with increased model size, suggesting that more sophisticated models exhibit higher susceptibility to instruction-following manipulations that lead to data leaks.
  • Production Model Exploits: On a practical level, the research also showed that by utilizing similar adversarial techniques against customized GPT instances, complete system prompt information could be extracted. For instance, direct prompt injections achieved a 100% success rate in data leakage from several domain-specific GPTs within few interactions, further raising concerns about the deployment of these technologies in sensitive fields like medicine and law.

Implications and Future Directions

The findings of this paper have significant implications both theoretically and practically:

  • Theoretical Implications: This research underscores the critical need for developing robust defense mechanisms against prompt injection attacks. It further highlights the necessity for comprehensive evaluations of the context management practices in RAG systems.
  • Practical Considerations: Developers and stakeholders should implement stringent access controls and filtering mechanisms to prevent unauthorized access and exploitation of RAG systems. Since large models could inadvertently memorize private data from their training datasets, ensuring clean and safe datastore integration becomes paramount.
  • Future Research Directions: The academic community is urged to explore advanced privacy-preserving techniques, including effective mitigation strategies for data sanitization, deduplication, and customized instruction tuning methodologies to prevent memorization of sensitive information.

Overall, this paper contributes to the ongoing discourse on RAG safety and raises critical awareness about potential data breaches through advanced adversarial methods, thus encouraging a closer examination of security measures in future AI deployments.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com