Fact2Fiction: Attacking Agentic Fact-Checkers
- Fact2Fiction is a framework that decomposes complex claims into atomic sub-claims and strategically poisons the verification process.
- It employs dual agents to mimic decomposition workflows and craft adversarial justifications, achieving an 8.9%–21.2% higher attack success rate.
- The approach exploits system transparency in multi-hop reasoning, highlighting the need for defenses like evidence provenance and multi-layer verification.
Fact2Fiction encompasses methodologies, frameworks, and recently identified vulnerabilities relating to the decomposition, verification, and targeted manipulation of complex claims in agentic fact-checking systems employing autonomous LLM-based agents. As of 2025, it refers both to the technical process by which fact-checking systems dissect claims into atomic sub-claims for granular verification, and to the adversarial attack paradigm targeting the sub-claim verification lifecycle (He et al., 8 Aug 2025). This framework exposes and exploits systemic weaknesses in explanation-driven, multi-hop reasoning pipelines central to modern scalable fact verification.
1. Agentic Fact-Checking System Architecture
Modern fact-checking processes (termed “agentic” systems) utilize autonomous LLM-based agents to break down a given complex claim into a set of sub-claims . Each sub-claim is independently verified—using external evidence retrieval and reasoning models—and a verdict (with explanation) is produced for each; verdicts and justifications are then aggregated for a final system-level decision.
This decomposition–verification–aggregation workflow is intended to bolster interpretability, transparency, and robustness: each explanatory rationale clarifies the evidence or logical step crucial to the verdict on sub-claim . Such design has become standard in high-fidelity systems for combating misinformation, as seen in frameworks like DEFAME and InFact.
2. Fact2Fiction Attack Paradigm
Fact2Fiction, introduced in 2025 (He et al., 8 Aug 2025), is the first explicit poisoning attack framework targeting agentic fact-checking systems at the sub-claim level. The attack exploits the same decomposition workflow as the victim system: it emulates decomposition to generate surrogate sub-questions, utilizes the system-generated justifications to extract key evidence and reasoning patterns, and then strategically inserts targeted malicious documents into the knowledge base to compromise verification.
Fact2Fiction is implemented as a two-agent system:
- Planner Agent
- Mirrors the system’s decomposition to obtain .
- Calculates weight scores for each sub-claim/evidence pair, based on their reasoning prominence in the victim’s justification.
- Allocates the poisoning budget according to the formula for each .
- Crafts adversarial answers explicitly contradicting the system-driven verdicts and rationales.
- Designs search queries that increase semantic retrievability of malicious evidence.
- Executor Agent
- Constructs per-sub-claim evidence , where is a tailored query and is the crafted adversarial text.
- Injects these documents into the evidence corpus to ensure retrieval during sub-claim verification.
3. Mechanism and Exploitation of Justifications
Central to the Fact2Fiction attack is the strategic leveraging of system-provided justifications. By analyzing the explanations attached to prior verdicts, the Planner identifies pivotal reasoning steps and evidence sources. It then orchestrates the production of adversarial answers and evidence that specifically contradict or subvert those reasoning anchors, thus undermining the verification process at its most granular level.
For example, if a system relies on evidence supporting "personal gardening is unrestricted" to validate a claim, Fact2Fiction intentionally injects high-confidence evidence asserting that "personal gardening is strictly regulated," using retrieval-optimized phrasing and rationale cues derived directly from the system's previous justifications.
4. Experimental Evaluation and Comparative Performance
Fact2Fiction was evaluated against state-of-the-art benchmarks and agentic fact-checkers (including DEFAME, InFact, and a naive “Simple” retrieval pipeline) using the real-world AVeriTeC dataset and varying poisoning budgets (e.g., 0.1%, 1%, 2%, ...). Results show:
- Attack Success Rate (ASR): Fact2Fiction achieves 8.9%–21.2% higher ASR than previous attacks (including PoisonedRAG).
- Budget Efficiency: Superior performance even at minimal poisoning rates; in some cases, Fact2Fiction achieved comparable ASR to baselines with 16× lower injection budget.
- Aggregation Effects: Larger budgets increased both ASR and System Fail Rate (SFR), but Fact2Fiction’s targeted allocation made attack efficiency notably high.
These empirical findings confirm that decomposition-aware, rationale-guided poisoning is markedly more effective than previous generic or claim-level attacks.
5. Vulnerabilities and Defensive Challenges
The attack demonstrates notable security weaknesses inherent in current agentic fact-checkers:
- System transparency, designed for interpretability (via justification/rationale outputs), facilitates adversarial reverse-engineering of reasoning and evidence emphasis.
- Traditional defenses—including paraphrasing, perplexity filtering, and clustering-based anomaly detection—are largely ineffective against Fact2Fiction, which produces semantically rich, naturalistic adversarial evidence tailored at the reasoning step rather than global narrative level.
- The very multi-agent decomposition that is meant to enhance robustness creates discrete targets for tailored evidence poisoning and skews verdicts during aggregation.
Proposed countermeasures include:
Defensive Strategy | Objective | Notional Limitation |
---|---|---|
Concealment of Reasoning | Reduce transparency of justifications | May reduce interpretability and user trust |
Provenance Analysis | Trace origin and semantic validity | Computationally intensive; adversarial adaptation likely |
Multi-layer Verification | Cross-source evidence validation | Can add latency; external sources may also be compromised |
6. Ethical Considerations and Broader Impacts
The explicit publication of decomposition-targeted attack methods like Fact2Fiction raises substantial ethical risks by enabling adversaries to deliberately subvert trust in fact-checking outputs. At the same time, these revelations are crucial for defensively exposing and remedying systemic vulnerabilities in information verification architectures. Widespread deployment of such attacks, if unmitigated, could undermine confidence in digital information ecosystems and facilitate new waves of misinformation.
A plausible implication is that fact-checking systems must balance transparency, adversarial resilience, and interpretability—potentially through modular trust scores, evidence provenance auditing, and obfuscation of specific reasoning steps in public-facing outputs.
7. Conclusion
Fact2Fiction marks a new epoch in adversarial threat modeling for agentic fact-checking. By mirroring decomposition pipelines and exploiting system-generated justifications for evidence poisoning, it becomes the first attack framework capable of efficiently and precisely subverting multi-hop, explanation-driven claim verification. Experimental results underscore its superiority to previous methods across a range of attack budgets and system architectures. These findings emphasize the urgent need for defensive innovation in the design of robust, transparent, and secure fact-verification pipelines for autonomous agents (He et al., 8 Aug 2025).