Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Certifiably Robust RAG against Retrieval Corruption (2405.15556v1)

Published 24 May 2024 in cs.LG, cs.CL, and cs.CR

Abstract: Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each passage in isolation and then securely aggregate these isolated responses. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG can achieve certifiable robustness: we can formally prove and certify that, for certain queries, RobustRAG can always return accurate responses, even when the attacker has full knowledge of our defense and can arbitrarily inject a small number of malicious passages. We evaluate RobustRAG on open-domain QA and long-form text generation datasets and demonstrate its effectiveness and generalizability across various tasks and datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Perplexity AI. Perplexity ai. https://www.perplexity.ai/, 2024.
  3. Self-rag: Learning to retrieve, generate, and critique through self-reflection. In International Conference on Learning Representations (ICLR), 2024.
  4. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning (ICML), 2018.
  5. Language models are few-shot learners. Advances in neural information processing systems (NeurIPS), 33:1877–1901, 2020.
  6. Evading adversarial example detection defenses with orthogonal projected gradient descent. In International Conference on Learning Representations (ICLR). OpenReview.net, 2022.
  7. Nicholas Carlini. A llm assisted exploitation of ai-guardian. arXiv preprint arXiv:2307.15008, 2023.
  8. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security (AISec@CCS), 2017.
  9. Certified defenses for adversarial patches. In 8th International Conference on Learning Representations (ICLR), 2020.
  10. Typos that broke the rag’s back: Genetic attack on rag pipeline by simulating documents in the wild via low-level perturbations. arXiv preprint arXiv:2404.13948, 2024.
  11. Synthetic disinformation attacks on automated fact verification systems. In AAAI Conference on Artificial Intelligence (AAAI), volume 36, pages 10581–10589, 2022.
  12. Enabling large language models to generate text with citations. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6465–6488. Association for Computational Linguistics, 2023.
  13. Google. Gemini 1.5, 2024.
  14. Google. Generative ai in search: Let google do the searching for you. https://blog.google/products/search/generative-ai-google-search-may-2024/, 2024.
  15. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. In ACM Workshop on Artificial Intelligence and Security (AISec@CCS), pages 79–90, 2023.
  16. Retrieval augmented language model pre-training. In International Conference on Machine Learning (ICML), volume 119, pages 3929–3938. PMLR, 2020.
  17. Discern and answer: Mitigating the impact of misinformation in retrieval-augmented models with discriminators. arXiv preprint arXiv:2305.01579, 2023.
  18. spaCy: Industrial-strength Natural Language Processing in Python. 2020.
  19. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  20. Realtime qa: What’s the answer right now? Advances in Neural Information Processing Systems (NeurIPS), 36, 2024.
  21. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:453–466, 2019.
  22. LangChain. LangChain. https://github.com/langchain-ai/langchain, 2024.
  23. Latent retrieval for weakly supervised open domain question answering. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 6086–6096. Association for Computational Linguistics, 2019.
  24. (De)randomized smoothing for certifiable defense against patch attacks. In Conference on Neural Information Processing Systems, (NeurIPS), 2020.
  25. Deep partition aggregation: Provable defenses against general poisoning attacks. In International Conference on Learning Representations (ICLR), 2021.
  26. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 9459–9474, 2020.
  27. Jerry Liu. LlamaIndex, 11 2022.
  28. Backdoor attacks on dense passage retrievers for disseminating misinformation. arXiv preprint arXiv:2402.13532, 2024.
  29. Search augmented instruction learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3717–3729, 2023.
  30. Minority reports defense: Defending against adversarial patches. In Applied Cryptography and Network Security Workshops (ACNS Workshops), volume 12418, pages 564–582. Springer, 2020.
  31. Microsoft. Bing chat. https://www.microsoft.com/en-us/edge/features/bing-chat, 2024.
  32. FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 12076–12100, Singapore, 2023. Association for Computational Linguistics.
  33. Attacking open-domain question answering by injecting misinformation. In International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, IJCNLP 2023, pages 525–539. Association for Computational Linguistics, 2023.
  34. On the risk of misinformation pollution with large language models. In Findings of the Association for Computational Linguistics: EMNLP, pages 1389–1403, 2023.
  35. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), pages 8024–8035, 2019.
  36. Run-off election: Improved provable defense against data poisoning attacks. In International Conference on Machine Learning (ICML), volume 202 of Proceedings of Machine Learning Research, pages 29030–29050. PMLR, 2023.
  37. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  38. On adaptive attacks to adversarial example defenses. In 2020 USENIX Security and AI Networking Summit (ScAINet), 2020.
  39. Freshllms: Refreshing large language models with search engine augmentation. arXiv preprint arXiv:2310.03214, 2023.
  40. Improved certified defenses against data poisoning with (deterministic) finite aggregation. In International Conference on Machine Learning (ICML), volume 162, pages 22769–22783. PMLR, 2022.
  41. Defending against disinformation attacks in open-domain question answering. In Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 402–417, 2024.
  42. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45. Association for Computational Linguistics, 2020.
  43. Patchguard: A provably robust defense against adversarial patches via small receptive fields and masking. In 30th USENIX Security Symposium (USENIX Security), 2021.
  44. Patchcleanser: Certifiably robust defense against adversarial patches for any image classifier. In 31st USENIX Security Symposium (USENIX Security), 2022.
  45. Patchguard++: Efficient provable attack detection against adversarial patches. In ICLR 2021 Workshop on Security and Safety in Machine Learning Systems, 2021.
  46. Objectseeker: Certifiably robust object detection against patch hiding attacks via patch-agnostic masking. In 44th IEEE Symposium on Security and Privacy (S&P), 2023.
  47. Patchcure: Improving certifiable robustness, model utility, and computation efficiency of adversarial patch defenses. arXiv preprint arXiv:2310.13076, 2023.
  48. Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884, 2024.
  49. Raft: Adapting language model to domain specific rag. arXiv preprint arXiv:2403.10131, 2024.
  50. Clipped bagnet: Defending against sticker attacks with clipped bag-of-features. In 3rd Deep Learning and Security Workshop (DLS), 2020.
  51. Retrievalqa: Assessing adaptive retrieval-augmented generation for short-form open-domain question answering, 2024.
  52. Judging LLM-as-a-judge with MT-bench and chatbot arena. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  53. Poisoning retrieval corpora by injecting adversarial passages. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 13764–13775, 2023.
  54. Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. arXiv preprint arXiv:2402.07867, 2024.
Citations (24)

Summary

  • The paper proposes RobustRAG, a framework that isolates and aggregates responses to ensure certifiable robustness against retrieval corruption attacks.
  • It employs keyword and decoding aggregation techniques to maintain high accuracy, achieving up to 71.0% certifiable accuracy on benchmarks like RealtimeQA.
  • The framework shows minimal performance trade-offs, reducing clean accuracy losses to below 11% while significantly lowering attack success rates.

Certifiably Robust RAG against Retrieval Corruption

The paper addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to retrieval corruption attacks, wherein attackers can inject malicious passages into the retrieval results to induce inaccurate responses. To counteract these threats, the authors propose RobustRAG, a defense framework that assures certifiable robustness against such retrieval corruption attacks.

Background and Motivation

LLMs like GPT-3.5 and Mistral-7B have shown limitations in accuracy due to incomplete and often outdated parametrized knowledge. RAG improves response accuracy by incorporating relevant external knowledge retrieved from large knowledge bases. This method is employed in several popular AI applications, such as Microsoft Bing Chat and Perplexity AI. However, RAG's reliance on external sources makes it susceptible to retrieval corruption, where attackers may insert malicious passages to manipulate the outputs of the system.

RobustRAG Framework

RobustRAG utilizes an isolate-then-aggregate strategy to mitigate the impact of malicious passages:

  1. Isolation: Responses are generated from each retrieved passage in isolation to prevent an attacker-influenced passage from contaminating others.
  2. Aggregation: The isolated responses are then securely aggregated to produce the final output.

Two techniques for secure response aggregation are proposed:

  • Keyword Aggregation: Extracts key terms from each response and aggregates them based on their frequency.
  • Decoding Aggregation: Combines predicted probability vectors from different isolated passages at each decoding step, ensuring robustness by making decisions that consider input consistency and confidence thresholds.

Theoretical Foundations

The authors elucidate the concept of certifiable robustness, aiming to ensure that for any given attack model allowing up to kk^\prime malicious passages among the top kk retrieved, the resultant responses will always maintain a certain accuracy threshold τ\tau. This is formulated through the introduction of various aggregation techniques, demonstrating that RobustRAG's architecture inherently limits the attack surface, thereby controlling the influence of corrupted passages.

Experimental Evaluation

The experimental setup encompasses various datasets including RealtimeQA, Natural Questions, and Biography Generation, and evaluates the system across different LLMs such as Mistral-7B-Instruct and Llama2-7B-Chat.

Key Findings:

  • Certifiable Robustness: RobustRAG consistently achieves notable certifiable accuracy across different datasets and models. For example, on the RealtimeQA-MC dataset, certifiable accuracy reaches up to 71.0%.
  • Clean Performance: Despite enhancing robustness, the clean performance of RobustRAG remains high, with accuracy drops remaining below 11% in most cases, demonstrating minimal performance trade-offs.
  • Empirical Robustness: Against prompt injection and data poisoning attacks, RobustRAG shows substantial resilience, reducing attack success rates below 10% in almost all cases.

Practical and Theoretical Implications

Practically, RobustRAG assures users and developers of AI applications that their systems can maintain high reliability even in the presence of sophisticated retrieval corruption attacks. Theoretically, the framework extends robustness analysis and certification to complex generative tasks, showcasing a method applicable beyond simple classification tasks.

Future Developments and Considerations

While RobustRAG provides a robust framework against retrieval corruption, several aspects warrant further investigation:

  • Retrieval Step Hardening: Strengthening the retrieval process itself to prevent the inclusion of malicious passages.
  • Multi-hop Queries: Extending RobustRAG to handle complex multi-hop queries effectively.
  • Minimizing Performance Trade-offs: Further reducing the clean performance drop to encourage broader adoption.
  • Integration with Advanced RAG Techniques: Combining RobustRAG with advanced RAG techniques such as self-critic and fine-tuning to further enhance its robustness and accuracy.

Conclusions

RobustRAG marks a significant advancement in the field of robust AI, particularly in connection with retrieval-augmented language systems. The isolate-then-aggregate strategy, combined with rigorous robustness certification, ensures that systems can withstand targeted attacks aimed at corrupting retrieved information. This work not only demonstrates the feasibility of building robust retrieval-augmented generative systems but also sets a foundational framework for future explorations in robust, certifiably secure AI applications.

HackerNews