Certifiably Robust RAG against Retrieval Corruption (2405.15556v1)
Abstract: Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each passage in isolation and then securely aggregate these isolated responses. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG can achieve certifiable robustness: we can formally prove and certify that, for certain queries, RobustRAG can always return accurate responses, even when the attacker has full knowledge of our defense and can arbitrarily inject a small number of malicious passages. We evaluate RobustRAG on open-domain QA and long-form text generation datasets and demonstrate its effectiveness and generalizability across various tasks and datasets.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Perplexity AI. Perplexity ai. https://www.perplexity.ai/, 2024.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. In International Conference on Learning Representations (ICLR), 2024.
- Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning (ICML), 2018.
- Language models are few-shot learners. Advances in neural information processing systems (NeurIPS), 33:1877–1901, 2020.
- Evading adversarial example detection defenses with orthogonal projected gradient descent. In International Conference on Learning Representations (ICLR). OpenReview.net, 2022.
- Nicholas Carlini. A llm assisted exploitation of ai-guardian. arXiv preprint arXiv:2307.15008, 2023.
- Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security (AISec@CCS), 2017.
- Certified defenses for adversarial patches. In 8th International Conference on Learning Representations (ICLR), 2020.
- Typos that broke the rag’s back: Genetic attack on rag pipeline by simulating documents in the wild via low-level perturbations. arXiv preprint arXiv:2404.13948, 2024.
- Synthetic disinformation attacks on automated fact verification systems. In AAAI Conference on Artificial Intelligence (AAAI), volume 36, pages 10581–10589, 2022.
- Enabling large language models to generate text with citations. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6465–6488. Association for Computational Linguistics, 2023.
- Google. Gemini 1.5, 2024.
- Google. Generative ai in search: Let google do the searching for you. https://blog.google/products/search/generative-ai-google-search-may-2024/, 2024.
- Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In ACM Workshop on Artificial Intelligence and Security (AISec@CCS), pages 79–90, 2023.
- Retrieval augmented language model pre-training. In International Conference on Machine Learning (ICML), volume 119, pages 3929–3938. PMLR, 2020.
- Discern and answer: Mitigating the impact of misinformation in retrieval-augmented models with discriminators. arXiv preprint arXiv:2305.01579, 2023.
- spaCy: Industrial-strength Natural Language Processing in Python. 2020.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Realtime qa: What’s the answer right now? Advances in Neural Information Processing Systems (NeurIPS), 36, 2024.
- Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466, 2019.
- LangChain. LangChain. https://github.com/langchain-ai/langchain, 2024.
- Latent retrieval for weakly supervised open domain question answering. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 6086–6096. Association for Computational Linguistics, 2019.
- (De)randomized smoothing for certifiable defense against patch attacks. In Conference on Neural Information Processing Systems, (NeurIPS), 2020.
- Deep partition aggregation: Provable defenses against general poisoning attacks. In International Conference on Learning Representations (ICLR), 2021.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 9459–9474, 2020.
- Jerry Liu. LlamaIndex, 11 2022.
- Backdoor attacks on dense passage retrievers for disseminating misinformation. arXiv preprint arXiv:2402.13532, 2024.
- Search augmented instruction learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3717–3729, 2023.
- Minority reports defense: Defending against adversarial patches. In Applied Cryptography and Network Security Workshops (ACNS Workshops), volume 12418, pages 564–582. Springer, 2020.
- Microsoft. Bing chat. https://www.microsoft.com/en-us/edge/features/bing-chat, 2024.
- FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 12076–12100, Singapore, 2023. Association for Computational Linguistics.
- Attacking open-domain question answering by injecting misinformation. In International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP 2023, pages 525–539. Association for Computational Linguistics, 2023.
- On the risk of misinformation pollution with large language models. In Findings of the Association for Computational Linguistics: EMNLP, pages 1389–1403, 2023.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), pages 8024–8035, 2019.
- Run-off election: Improved provable defense against data poisoning attacks. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 29030–29050. PMLR, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- On adaptive attacks to adversarial example defenses. In 2020 USENIX Security and AI Networking Summit (ScAINet), 2020.
- Freshllms: Refreshing large language models with search engine augmentation. arXiv preprint arXiv:2310.03214, 2023.
- Improved certified defenses against data poisoning with (deterministic) finite aggregation. In International Conference on Machine Learning (ICML), volume 162, pages 22769–22783. PMLR, 2022.
- Defending against disinformation attacks in open-domain question answering. In Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 402–417, 2024.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45. Association for Computational Linguistics, 2020.
- Patchguard: A provably robust defense against adversarial patches via small receptive fields and masking. In 30th USENIX Security Symposium (USENIX Security), 2021.
- Patchcleanser: Certifiably robust defense against adversarial patches for any image classifier. In 31st USENIX Security Symposium (USENIX Security), 2022.
- Patchguard++: Efficient provable attack detection against adversarial patches. In ICLR 2021 Workshop on Security and Safety in Machine Learning Systems, 2021.
- Objectseeker: Certifiably robust object detection against patch hiding attacks via patch-agnostic masking. In 44th IEEE Symposium on Security and Privacy (S&P), 2023.
- Patchcure: Improving certifiable robustness, model utility, and computation efficiency of adversarial patch defenses. arXiv preprint arXiv:2310.13076, 2023.
- Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884, 2024.
- Raft: Adapting language model to domain specific rag. arXiv preprint arXiv:2403.10131, 2024.
- Clipped bagnet: Defending against sticker attacks with clipped bag-of-features. In 3rd Deep Learning and Security Workshop (DLS), 2020.
- Retrievalqa: Assessing adaptive retrieval-augmented generation for short-form open-domain question answering, 2024.
- Judging LLM-as-a-judge with MT-bench and chatbot arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
- Poisoning retrieval corpora by injecting adversarial passages. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 13764–13775, 2023.
- Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. arXiv preprint arXiv:2402.07867, 2024.