POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models (2505.06579v1)

Published 10 May 2025 in cs.CR

Abstract: LLMs have achieved remarkable success in various domains, primarily due to their strong capabilities in reasoning and generating human-like text. Despite their impressive performance, LLMs are susceptible to hallucinations, which can lead to incorrect or misleading outputs. This is primarily due to the lack of up-to-date knowledge or domain-specific information. Retrieval-augmented generation (RAG) is a promising approach to mitigate hallucinations by leveraging external knowledge sources. However, the security of RAG systems has not been thoroughly studied. In this paper, we study a poisoning attack on RAG systems named POISONCRAFT, which can mislead the model to refer to fraudulent websites. Compared to existing poisoning attacks on RAG systems, our attack is more practical as it does not require access to the target user query's info or edit the user query. It not only ensures that injected texts can be retrieved by the model, but also ensures that the LLM will be misled to refer to the injected texts in its response. We demonstrate the effectiveness of POISONCRAFTacross different datasets, retrievers, and LLMs in RAG pipelines, and show that it remains effective when transferred across retrievers, including black-box systems. Moreover, we present a case study revealing how the attack influences both the retrieval behavior and the step-by-step reasoning trace within the generation model, and further evaluate the robustness of POISONCRAFTunder multiple defense mechanisms. These results validate the practicality of our threat model and highlight a critical security risk for RAG systems deployed in real-world applications. We release our code\footnote{https://github.com/AndyShaw01/PoisonCraft} to support future research on the security and robustness of RAG systems in real-world settings.

Summary

The paper introduces PoisonCraft, a novel, practical, query-agnostic poisoning attack specifically designed for Retrieval-Augmented Generation (RAG) systems.
PoisonCraft is a query-agnostic attack utilizing specific poisoned document structures to manipulate both retrieval and LLM generation.
The research demonstrates PoisonCraft's high attack success rate against various RAG setups and its resilience to common defense mechanisms evaluated in the study.

PoisonCraft: Practical Poisoning of Retrieval-Augmented Generation for LLMs

The paper "PoisonCraft: Practical Poisoning of Retrieval-Augmented Generation for LLMs" addresses the vulnerability of Retrieval-Augmented Generation (RAG) systems to poisoning attacks. As LLMs become integral in various applications, ensuring their reliability and security is paramount. Despite the advancement of these models in mimicking human-like text and reasoning, their susceptibility to hallucinations due to outdated knowledge necessitates the augmentation with external knowledge sources through RAG systems. However, the security aspects of these systems have not been thoroughly explored.

Attack Design and Features

PoisonCraft introduces a novel attack methodology specifically targeting RAG systems. Traditional poisoning attacks often entail unrealistic assumptions, such as requiring access to the user queries or editing the queries themselves. PoisonCraft circumvents these by implementing a query-agnostic framework that conducts end-to-end manipulation without needing prior knowledge of the user query or modifications of the query inputs. This approach makes PoisonCraft particularly stealthy and practical for real-world scenarios.

The attack follows an innovative structure: each poisoned document encompasses injected poisoned knowledge, frequency-based anchors derived from a shadow query set, and an adversarial suffix optimized to entice retrieval systems. By leveraging these components, PoisonCraft ensures not only the retrieval of poisoned samples by the target retriever but also that the LLM is coerced into generating responses influenced by malicious content. Thus, the attack achieves its goal of promoting fraudulent websites while maintaining the facade of factual accuracy in responses.

Evaluation Metrics and Results

The research meticulously evaluates PoisonCraft across various datasets, retrievers, and LLMs. Experiments demonstrate its efficacy using top- $k$ retrieval settings where PoisonCraft consistently outperforms baseline approaches such as Prompt Injection and PoisonedRAG in both Attack Success Rate for Retrieval (ASR-r) and Attack Success Rate for Target (ASR-t) measurements. Notably, PoisonCraft achieves significant improvement in ASR-r and ASR-t across different scenarios, illustrating its robustness against diverse query distributions and retriever architectures.

Moreover, the paper validates PoisonCraft's transferability to other retrievers, including proprietary black-box models. This is accomplished by showcasing how poisoned samples maintain a degree of effectiveness even when subjected to different retriever environments. The described attack successfully manipulates both retrieval and post-retrieval stages, influencing models with reasoning capabilities to incorporate adversarial cues into their generation processes.

Defense Mechanisms and Limitations

The paper also explores the defenses against such poisoning attacks, employing strategies at various levels within the RAG pipeline—query, knowledge base, and context sanitization. Paraphrasing defenses, duplicate text filtering, and reranker filtering are among the employed mechanisms. Yet, PoisonCraft demonstrates resilience against these defenses, attributing its success to the thorough and varied optimization of adversarial content.

Despite the effectiveness of PoisonCraft, the paper acknowledges certain limitations. The approach, while robust, might still benefit from enhancement against stronger defensive models, and the adaptability of the attack across new application domains remains a subject for future exploration.

Implications for Future AI Research

The implications of this research are multifaceted. Practically, it underscores the necessity for improved security measures in RAG systems to protect against subtle poisoning attacks that could compromise the integrity and reliability of LLM-driven applications. Theoretically, it opens avenues for further exploration into secure model design that integrates awareness and mitigation of malicious influences throughout both retrieval and generation stages.

In summary, this paper advances the understanding of vulnerabilities inherent in RAG systems and proposes a formidable framework for exploiting these weaknesses. Future developments in this space will likely focus on reinforcing model architectures against such threats, ensuring that LLMs can serve their intended purpose without the risk of delivering compromised or misleading information.

Related Papers

Find Related Papers

GitHub

GitHub - AndyShaw01/PoisonCraft: This repository provides the official implementation of POISONCRAFT: Practical Poisoning of Retrieval-Augmented Generation for Large Language Models. (1 star)

Tweets

https://twitter.com/FSFG/status/1922471728057352441

YouTube

Show All Videos