HijackRAG: Hijacking Attacks against Retrieval-Augmented Large Language Models (2410.22832v1)
Abstract: Retrieval-Augmented Generation (RAG) systems enhance LLMs by integrating external knowledge, making them adaptable and cost-effective for various applications. However, the growing reliance on these systems also introduces potential security risks. In this work, we reveal a novel vulnerability, the retrieval prompt hijack attack (HijackRAG), which enables attackers to manipulate the retrieval mechanisms of RAG systems by injecting malicious texts into the knowledge database. When the RAG system encounters target questions, it generates the attacker's pre-determined answers instead of the correct ones, undermining the integrity and trustworthiness of the system. We formalize HijackRAG as an optimization problem and propose both black-box and white-box attack strategies tailored to different levels of the attacker's knowledge. Extensive experiments on multiple benchmark datasets show that HijackRAG consistently achieves high attack success rates, outperforming existing baseline attacks. Furthermore, we demonstrate that the attack is transferable across different retriever models, underscoring the widespread risk it poses to RAG systems. Lastly, our exploration of various defense mechanisms reveals that they are insufficient to counter HijackRAG, emphasizing the urgent need for more robust security measures to protect RAG systems in real-world deployments.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
- Chase, H. 2022. LangChain. https://www.langchain.com.
- Chatlaw: Open-source legal large language model with integrated external knowledge bases. arXiv preprint arXiv:2306.16092.
- The Llama 3 Herd of Models. arXiv preprint arXiv:2407.21783.
- HotFlip: White-Box Adversarial Examples for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 31–36.
- ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools. arXiv preprint arXiv:2406.12793.
- Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 79–90.
- REALM: retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, 3929–3938.
- Towards unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118, 2(3).
- Baseline defenses for adversarial attacks against aligned language models. arXiv preprint arXiv:2309.00614.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12): 1–38.
- Jones, K. S. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1): 11–21.
- Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
- Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7: 453–466.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33: 9459–9474.
- Liu, J. 2023. LlamaIndex. https://www.llamaindex.ai.
- Recall: A benchmark for llms robustness against external counterfactual knowledge. arXiv preprint arXiv:2311.08147.
- MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. choice, 2640: 660.
- OpenAI. 2023. GPT data retrieval. https://platform.openai.com/docs/actions/data-retrieval.
- GPT-4 Technical Report. arXiv:2303.08774.
- Fine-tuning or retrieval? comparing knowledge injection in llms. arXiv preprint arXiv:2312.05934.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813.
- Ignore Previous Prompt: Attack Techniques For Language Models. In NeurIPS ML Safety Workshop.
- Petitcolas, F. A. 2023. Kerckhoffs’ principle. In Encyclopedia of Cryptography, Security and Privacy, 1–2. Springer.
- Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics.
- Retool. 2022. State of AI Report 2024. https://retool.com/blog/state-of-ai-h1-2024.
- How Much Knowledge Can You Pack Into the Parameters of a Language Model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics.
- Ignore this title and HackAPrompt: Exposing systemic vulnerabilities of LLMs through a global prompt hacking competition. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 4945–4977.
- LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions. In The 61st Annual Meeting Of The Association For Computational Linguistics.
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, 10303–10315.
- Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In International Conference on Learning Representations.
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- Poisoning Retrieval Corpora by Injecting Adversarial Passages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 13764–13775.
- DocPrompting: Generating Code by Retrieving the Docs. In The Eleventh International Conference on Learning Representations.
- Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. arXiv preprint arXiv:2402.07867.