AFEV: Iterative Extraction & Verification

Updated 27 November 2025

AFEV is a fact-checking paradigm that decomposes complex claims into atomic factual units and verifies them through structured, iterative processing.
It employs dense retrieval and cross-encoder reranking to select relevant evidence and dynamic in-context demonstrations for precise verification.
The approach mitigates error propagation through closed-loop iterative refinement and transparent aggregation of subclaim results.

Iterative Extraction and Verification (AFEV) is a modular fact-checking paradigm in which a complex claim is decomposed into atomic factual units and verified for veracity through fine-grained evidence retrieval, adaptive in-context reasoning, and dynamic aggregation of intermediate results. AFEV is designed to mitigate reasoning failures, control error propagation, and enhance interpretability in verification of multi-hop, compositional, and adversarial claims.

1. Formal Structure and Notation

Let $C$ denote a complex natural-language claim and $D$ an external textual corpus. The AFEV pipeline is defined by the following iterative sequence:

Decomposition: $F = \{F_1, ..., F_T\}$ , where each $F_t$ is an atomic fact extracted from $C$ via auto-regressive or conditional generation.
Evidence Retrieval: For each $F_t$ $F_{t}$ ,
- Retrieve $E'_t = \operatorname{Top}_{k'}\left\{ e_j \in D \mid \operatorname{score}(e_j, F_t) \right\}$ using dense dual-encoder cosine similarity:
$\operatorname{score}(e_j, F_t) = \frac{f(e_j) \cdot f(F_t)}{\|f(e_j)\| \cdot \|f(F_t)\|}$ - Rerank $E'_t$ to select $E_t$ using a cross-encoder reranker trained with InfoNCE loss:

$\mathcal{L}_r = -\frac{1}{N} \sum_{i=1}^N \log \frac{e^{f(F_i) \cdot f(e^+)/\tau}}{\sum_{j=1}^m e^{f(F_i) \cdot f(e_j^-)/\tau}}$

where $e^+$ is the true evidence and $e_j^-$ negatives.
Dynamic Demonstration Selection: For each $F_t$ , select dynamic context-specific demonstrations $A_t = \{a_1^t, ..., a_d^t\}$ from a database of labeled claims $\mathcal{C}$ , maximizing semantic similarity with $F_t$ .
Reasoning: For each $F_t$ , aggregate its retrieved evidence $E_t$ and demonstrations $A_t$ to produce a fact-level label $y_t \in \{\text{True}, \text{False}, \text{Unverifiable}\}$ and rationale $r_t$ :

$(y_t, r_t) = \operatorname{Reasoner}(F_t, C, E_t, A_t)$
Iterative Refinement: Fact extraction for $F_{t+1}$ is conditioned on previous outputs:

$F_{t+1} = \operatorname{Extractor}(C, F_{1:t}, y_{1:t}, r_{1:t})$

until a coverage-based STOP criterion is met.
Aggregation and Final Decision: The set $\{y_1, ..., y_T\}$ and rationales $\{r_1, ..., r_T\}$ are composed to form the overall verdict $y^*$ :

$y^* = \operatorname{Aggregate}(\{y_t\}_{t=1}^T, \{r_t\}_{t=1}^T)$

This design facilitates both interpretability and dynamic correction by tightly coupling decomposition, retrieval, and verification.

2. Iterative Algorithmic Workflow

The full AFEV protocol is implemented as a closed-loop iterative process:

def AFEV_FactVerification(C, D, ℂ):
    F_list, Y_list, R_list = [], [], []
    while True:
        F_t = Extractor(C, F_list, Y_list, R_list)
        if F_t == "STOP":
            break
        F_list.append(F_t)
        E_t_prime = retrieve_top_k(D, F_t)
        E_t = Reranker(E_t_prime, F_t)
        A_t = select_demonstrations(ℂ, F_t)
        y_t, r_t = Reasoner(F_t, C, E_t, A_t)
        Y_list.append(y_t)
        R_list.append(r_t)
    y_star = Aggregate(F_list, Y_list, R_list)
    return y_star, Y_list, R_list

Each extraction step exploits prior subclaim-verification pairs, closing the error-propagation loop.
The iterative approach ends adaptively once the atomic fact set collectively covers the full semantic content of $C$ .

AFEV addresses error accumulation through:

Closed-loop Extraction: Each new atomic fact is generated with explicit conditioning on the entire verified/debated history ( $F_{1:t-1}, y_{1:t-1}, r_{1:t-1}$ ). This dynamically corrects prior faulty decompositions and prevents irrecoverable branching errors.
STOP Criterion: Extraction halts when all semantic units of the original claim are accounted for, preventing both under- and over-decomposition.
Supervised Evidence Filtering: A learned reranker suppresses low-quality or semantically irrelevant evidence early in the pipeline, minimizing contamination at the reasoning phase.

These mechanisms jointly prevent uncontrolled noise propagation widely observed in one-shot decomposition strategies.

4. Fine-Grained Retrieval and In-Context Demonstrations

Retrieval is performed in a two-stage manner:

Dense Retrieval: A bi-encoder computes vector representations and ranks all candidates; $k'$ is typically set to 5.
Cross-Encoder Reranking: Top- $k'$ passages undergo pairwise reranking. The best $k$ evidence sentences (usually $k=2$ ) are retained for the reasoning module.
Dynamic Demonstrations: For each $F_t$ , $d$ previously validated claims with highest semantic similarity serve as in-context examples (typically $d=1$ or $2$). This on-the-fly instance retrieval aligns LLM behavior with the specifics of the current subclaim.

The above provides both coverage and context specificity for nuanced multi-hop verification.

5. Benchmark Experiments and Empirical Results

Evaluation across LIAR-PLUS, HOVER, PolitiHop, RAWFC, and LIAR benchmarks demonstrates robust gains:

Dataset	Baseline LA / F1	AFEV LA / F1
LIAR-PLUS	82.10 / 80.78	83.73 / 83.12
HOVER	76.98 / 76.89	78.87 / 78.76
PolitiHop	72.34 / 55.80	74.14 / 57.69
RAWFC (F1)	57.3	60.2
LIAR (F1)	42.0	43.9

Ablation studies confirm that atomic fact extraction (+1.1 to +1.9 F1), iterative extraction (+0.5 to +1.1), rationales, reranking, and demonstrations all yield measurable improvements. Optimal performance is achieved with $k=1$ –$2$ reranked evidences and $d=1$ –$2$ demonstrations per fact-level query. Efficiency is preserved; the closed-loop variant increases runtime by $<25\%$ compared to one-shot baselines.

6. Interpretability and Case Analysis

AFEV natively produces a fine-grained audit trail:

Each subclaim is paired with the retrieved evidence, in-context demonstration, rationale, and atomic label.
Intermediate errors or ambiguities can be directly traced to precise subcomponents, facilitating targeted correction.

Case studies illustrate complex sports-statistics claims disassembled into player-year facts, with independent count-verification and explicit cross-references, ultimately yielding a transparent, human-readable decision sequence.

7. Significance, Limitations, and Future Directions

AFEV demonstrates that iterative, atomic-factorization and adaptive reasoning outperform static or monolithic pipelines, particularly for multi-hop, ambiguous, or adversarial claims (Zheng et al., 9 Jun 2025). The explicit, chained reasoning and evidence tracking address key issues in factual verification: brittle error propagation, noisy retrieval, and interpretability bottlenecks.

Limitations include increased dependence on retrieval quality, potential slowdowns for claims requiring many atomic units, and sensitivity to noise in the demonstration database. Future work proposed in (Zheng et al., 9 Jun 2025) includes tighter couplings with retrieval-learning objectives, multi-hop reasoning extension with deep aggregation, and application of the retrieve–edit–aggregate loop to heterogeneous domains (e.g., code, scientific fact-checking). Empirical evidence from recent multi-hop benchmarks supports continued development in this direction.

PDF Markdown Chat (Pro)

References (1)

Fact in Fragments: Deconstructing Complex Claims via LLM-based Atomic Fact Extraction and Verification (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Iterative Extraction and Verification (AFEV).

AFEV: Iterative Extraction & Verification

1. Formal Structure and Notation

2. Iterative Algorithmic Workflow

3. Dynamic Refinement and Error Control

4. Fine-Grained Retrieval and In-Context Demonstrations

5. Benchmark Experiments and Empirical Results

6. Interpretability and Case Analysis

7. Significance, Limitations, and Future Directions

Whiteboard

Follow Topic

Continue Learning

AFEV: Iterative Extraction & Verification

1. Formal Structure and Notation

2. Iterative Algorithmic Workflow

3. Dynamic Refinement and Error Control

4. Fine-Grained Retrieval and In-Context Demonstrations

5. Benchmark Experiments and Empirical Results

6. Interpretability and Case Analysis

7. Significance, Limitations, and Future Directions

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics