ArgRAG: Explainable Retrieval-Enhanced Framework

Updated 31 August 2025

ArgRAG is an explainable retrieval-augmented generation framework that integrates evidence retrieval with formal argumentative reasoning using QBAFs.
It employs deterministic quadratic energy gradual semantics to compute argument strengths through explicit support and attack relationships.
Evaluated on PubHealth and RAGuard benchmarks, ArgRAG outperforms conventional systems in handling noisy, contradictory evidence in critical domains.

ArgRAG denotes an explainable retrieval-augmented generation framework that combines evidence retrieval with formal argumentative reasoning using Quantitative Bipolar Argumentation Frameworks (QBAFs). ArgRAG is designed to overcome persistent limitations of conventional RAG systems in high-stakes domains—specifically, their sensitivity to contradictory or noisy evidence and their reliance on stochastic, black-box decision-making. By structuring retrieved evidence as arguments linked by support and attack relations, and performing deterministic strength inference under gradual semantics, ArgRAG achieves enhanced transparency, contestability, and robustness when adjudicating factual claims. The efficacy of ArgRAG has been demonstrated on the PubHealth and RAGuard fact verification benchmarks, where it produces competitive accuracy and distinctly interpretable explanations by construction (Zhu et al., 26 Aug 2025).

1. System Architecture: Retrieval, Argument Construction, and Relation Annotation

Given a natural language claim, ArgRAG first executes document retrieval via models such as Contriever-MS MARCO. Each retrieved document is treated as an argument, combined with the claim itself to form the set of nodes $A$ in the QBAF. Every argument is initialized with a uniform base score $\beta(a)=0.5$ , representing an uncommitted stance before reasoning.

Relation annotation operates in two steps using LLM prompting:

Evidence Polarity Classification: Each retrieved candidate is labeled as "support," "contradict," or "irrelevant" with respect to the claim; irrelevant documents are filtered out.
Pairwise Relation Annotation: For the remaining arguments, all pairs are classified to detect internal support or attack relations between retrieved evidences. These directed edges are then added to $\text{Sup}$ and $\text{Att}$ relations, maintaining $\text{Att} \cap \text{Sup} = \emptyset$ .

This process yields a QBAF $Q = (A, \text{Att}, \text{Sup}, \beta)$ defined over the claim and its relevant evidence, with explicit support and conflict structure.

2. Quantitative Bipolar Argumentation Framework (QBAF) and Gradual Semantics

The QBAF is a structured network where nodes are arguments, edges encode supporting ( $\text{Sup}$ ) or attacking ( $\text{Att}$ ) relations, and each node carries an initial base belief $\beta$ .

Deterministic inference adopts quadratic energy (QE) gradual semantics to calculate argument strengths. For each $a \in A$ , the net support/attack influence is:

$E(a) = \sum_{b \in \text{Sup}(a)} \sigma(b) - \sum_{b \in \text{Att}(a)} \sigma(b)$

The strength update rule is then

$\sigma(a) = \beta(a) + (1 - \beta(a)) \, h(E(a)) - \beta(a) \, h(-E(a))$

where the nonlinear activation $h(x) = \frac{\max\{x, 0\}^2}{1 + \max\{x, 0\}^2}$ . Iterating these updates converges to an equilibrium that determines the final score for each argument. For the claim $a_0$ , if $\sigma(a_0) \geq 0.5$ then the claim is classified as true; otherwise, false.

This mechanism provides continuous-valued, deterministic inference, sharply contrasting the stochastic sampling in traditional RAG models.

3. Explainability and User Contestability

ArgRAG supports faithful, auditable explanation at decision time:

The reasoning trail is manifest in the constructed QBAF, allowing each verdict (e.g., claim is supported or contradicted) to be traced through the network structure and the influence of each supporting or attacking argument.
Score evolution can be visualized, and users may inspect or modify base scores and edge polarities; the QBAF inference then updates deterministically to reflect these changes, providing contestability of the decision.
This explicit structured reasoning and transparency is not available in black-box RAG architectures, which cannot faithfully explain or contest their generation process.

4. Empirical Evaluation and Robustness

Experimental evaluation of ArgRAG on the PubHealth and RAGuard datasets demonstrates its effectiveness:

Accuracy on PubHealth with Top-5 retrieval reaches approximately $0.835$–$0.892$ across different LLM backbones (GPT-3.5, GPT-4o-mini, GPT-4.1-mini).
ArgRAG consistently outperforms conventional RAG systems and no-retriever baselines, especially under conditions with conflicting or noisy evidence typical in healthcare and politically charged domains.
Performance remains robust to retrieval depth, and experiments confirm that the structured and deterministic reasoning shields ArgRAG from the opaque behaviors induced by noise in probabilistic autoregressive models.

5. Applications in High-Stakes Domains

The explicit, contestable reasoning of ArgRAG is particularly valued in settings where reliability and transparency are paramount:

Healthcare: Clinical and biomedical claim verification, evidentiary traceability, and decision auditing.
Legal and Financial Domains: Compliance, justification, and accountability for rule-based claim adjudication.
Political Fact-Checking: Robust handling and explanation of contradictory or misleading claims in adversarial settings.

In all these domains, the ability to interactively review, explain, and contest the inference process provides significant advances in trustworthiness and decision support over prior opaque approaches.

6. Technical and Algorithmic Details

ArgRAG implements Algorithm 1 as described in the source:

Retrieve evidence to assemble argument set $A$ (claim and documents).
Annotate support/contradict/irrelevant relations via LLM prompts; filter out irrelevant arguments.
For non-irrelevant arguments, annotate pairwise relations to extend $\text{Sup}$ and $\text{Att}$ .
Apply QE gradual semantics iteratively for each $a \in A$ until convergence.
Predict claim truth by checking $\sigma(a_0)$ against threshold $\tau$ (default $0.5$).

All components—including relation annotation, strength updates, and verdict assignment—are amenable to visualization and interactive examination.

7. Significance and Implications

ArgRAG establishes a new paradigm in retrieval-augmented generation by embedding formal argumentative structure and deterministic reasoning. It sharply increases explainability, robustness to noise and contradiction, and facilitates user contestability. These features set ArgRAG apart from conventional RAG architectures and position it as a foundational methodology for explainable AI in domains where reliability and auditability are critical (Zhu et al., 26 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

ArgRAG: Explainable Retrieval Augmented Generation using Quantitative Bipolar Argumentation (2025)

Follow Topic

Get notified by email when new papers are published related to ArgRAG.