Papers
Topics
Authors
Recent
2000 character limit reached

AFEV: Iterative Extraction & Verification

Updated 27 November 2025
  • AFEV is a fact-checking paradigm that decomposes complex claims into atomic factual units and verifies them through structured, iterative processing.
  • It employs dense retrieval and cross-encoder reranking to select relevant evidence and dynamic in-context demonstrations for precise verification.
  • The approach mitigates error propagation through closed-loop iterative refinement and transparent aggregation of subclaim results.

Iterative Extraction and Verification (AFEV) is a modular fact-checking paradigm in which a complex claim is decomposed into atomic factual units and verified for veracity through fine-grained evidence retrieval, adaptive in-context reasoning, and dynamic aggregation of intermediate results. AFEV is designed to mitigate reasoning failures, control error propagation, and enhance interpretability in verification of multi-hop, compositional, and adversarial claims.

1. Formal Structure and Notation

Let CC denote a complex natural-language claim and DD an external textual corpus. The AFEV pipeline is defined by the following iterative sequence:

  • Decomposition: F={F1,...,FT}F = \{F_1, ..., F_T\}, where each FtF_t is an atomic fact extracted from CC via auto-regressive or conditional generation.
  • Evidence Retrieval: For each FtF_t,

    • Retrieve Et=Topk{ejDscore(ej,Ft)}E'_t = \operatorname{Top}_{k'}\left\{ e_j \in D \mid \operatorname{score}(e_j, F_t) \right\} using dense dual-encoder cosine similarity:

    score(ej,Ft)=f(ej)f(Ft)f(ej)f(Ft)\operatorname{score}(e_j, F_t) = \frac{f(e_j) \cdot f(F_t)}{\|f(e_j)\| \cdot \|f(F_t)\|} - Rerank EtE'_t to select EtE_t using a cross-encoder reranker trained with InfoNCE loss:

    Lr=1Ni=1Nlogef(Fi)f(e+)/τj=1mef(Fi)f(ej)/τ\mathcal{L}_r = -\frac{1}{N} \sum_{i=1}^N \log \frac{e^{f(F_i) \cdot f(e^+)/\tau}}{\sum_{j=1}^m e^{f(F_i) \cdot f(e_j^-)/\tau}}

    where e+e^+ is the true evidence and eje_j^- negatives.

  • Dynamic Demonstration Selection: For each FtF_t, select dynamic context-specific demonstrations At={a1t,...,adt}A_t = \{a_1^t, ..., a_d^t\} from a database of labeled claims C\mathcal{C}, maximizing semantic similarity with FtF_t.
  • Reasoning: For each FtF_t, aggregate its retrieved evidence EtE_t and demonstrations AtA_t to produce a fact-level label yt{True,False,Unverifiable}y_t \in \{\text{True}, \text{False}, \text{Unverifiable}\} and rationale rtr_t:

    (yt,rt)=Reasoner(Ft,C,Et,At)(y_t, r_t) = \operatorname{Reasoner}(F_t, C, E_t, A_t)

  • Iterative Refinement: Fact extraction for Ft+1F_{t+1} is conditioned on previous outputs:

    Ft+1=Extractor(C,F1:t,y1:t,r1:t)F_{t+1} = \operatorname{Extractor}(C, F_{1:t}, y_{1:t}, r_{1:t})

    until a coverage-based STOP criterion is met.

  • Aggregation and Final Decision: The set {y1,...,yT}\{y_1, ..., y_T\} and rationales {r1,...,rT}\{r_1, ..., r_T\} are composed to form the overall verdict yy^*:

    y=Aggregate({yt}t=1T,{rt}t=1T)y^* = \operatorname{Aggregate}(\{y_t\}_{t=1}^T, \{r_t\}_{t=1}^T)

This design facilitates both interpretability and dynamic correction by tightly coupling decomposition, retrieval, and verification.

2. Iterative Algorithmic Workflow

The full AFEV protocol is implemented as a closed-loop iterative process:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def AFEV_FactVerification(C, D, ℂ):
    F_list, Y_list, R_list = [], [], []
    while True:
        F_t = Extractor(C, F_list, Y_list, R_list)
        if F_t == "STOP":
            break
        F_list.append(F_t)
        E_t_prime = retrieve_top_k(D, F_t)
        E_t = Reranker(E_t_prime, F_t)
        A_t = select_demonstrations(ℂ, F_t)
        y_t, r_t = Reasoner(F_t, C, E_t, A_t)
        Y_list.append(y_t)
        R_list.append(r_t)
    y_star = Aggregate(F_list, Y_list, R_list)
    return y_star, Y_list, R_list

  • Each extraction step exploits prior subclaim-verification pairs, closing the error-propagation loop.
  • The iterative approach ends adaptively once the atomic fact set collectively covers the full semantic content of CC.

3. Dynamic Refinement and Error Control

AFEV addresses error accumulation through:

  • Closed-loop Extraction: Each new atomic fact is generated with explicit conditioning on the entire verified/debated history (F1:t1,y1:t1,r1:t1F_{1:t-1}, y_{1:t-1}, r_{1:t-1}). This dynamically corrects prior faulty decompositions and prevents irrecoverable branching errors.
  • STOP Criterion: Extraction halts when all semantic units of the original claim are accounted for, preventing both under- and over-decomposition.
  • Supervised Evidence Filtering: A learned reranker suppresses low-quality or semantically irrelevant evidence early in the pipeline, minimizing contamination at the reasoning phase.

These mechanisms jointly prevent uncontrolled noise propagation widely observed in one-shot decomposition strategies.

4. Fine-Grained Retrieval and In-Context Demonstrations

Retrieval is performed in a two-stage manner:

  • Dense Retrieval: A bi-encoder computes vector representations and ranks all candidates; kk' is typically set to 5.
  • Cross-Encoder Reranking: Top-kk' passages undergo pairwise reranking. The best kk evidence sentences (usually k=2k=2) are retained for the reasoning module.
  • Dynamic Demonstrations: For each FtF_t, dd previously validated claims with highest semantic similarity serve as in-context examples (typically d=1d=1 or $2$). This on-the-fly instance retrieval aligns LLM behavior with the specifics of the current subclaim.

The above provides both coverage and context specificity for nuanced multi-hop verification.

5. Benchmark Experiments and Empirical Results

Evaluation across LIAR-PLUS, HOVER, PolitiHop, RAWFC, and LIAR benchmarks demonstrates robust gains:

Dataset Baseline LA / F1 AFEV LA / F1
LIAR-PLUS 82.10 / 80.78 83.73 / 83.12
HOVER 76.98 / 76.89 78.87 / 78.76
PolitiHop 72.34 / 55.80 74.14 / 57.69
RAWFC (F1) 57.3 60.2
LIAR (F1) 42.0 43.9

Ablation studies confirm that atomic fact extraction (+1.1 to +1.9 F1), iterative extraction (+0.5 to +1.1), rationales, reranking, and demonstrations all yield measurable improvements. Optimal performance is achieved with k=1k=1–$2$ reranked evidences and d=1d=1–$2$ demonstrations per fact-level query. Efficiency is preserved; the closed-loop variant increases runtime by <25%<25\% compared to one-shot baselines.

6. Interpretability and Case Analysis

AFEV natively produces a fine-grained audit trail:

  • Each subclaim is paired with the retrieved evidence, in-context demonstration, rationale, and atomic label.
  • Intermediate errors or ambiguities can be directly traced to precise subcomponents, facilitating targeted correction.

Case studies illustrate complex sports-statistics claims disassembled into player-year facts, with independent count-verification and explicit cross-references, ultimately yielding a transparent, human-readable decision sequence.

7. Significance, Limitations, and Future Directions

AFEV demonstrates that iterative, atomic-factorization and adaptive reasoning outperform static or monolithic pipelines, particularly for multi-hop, ambiguous, or adversarial claims (Zheng et al., 9 Jun 2025). The explicit, chained reasoning and evidence tracking address key issues in factual verification: brittle error propagation, noisy retrieval, and interpretability bottlenecks.

Limitations include increased dependence on retrieval quality, potential slowdowns for claims requiring many atomic units, and sensitivity to noise in the demonstration database. Future work proposed in (Zheng et al., 9 Jun 2025) includes tighter couplings with retrieval-learning objectives, multi-hop reasoning extension with deep aggregation, and application of the retrieve–edit–aggregate loop to heterogeneous domains (e.g., code, scientific fact-checking). Empirical evidence from recent multi-hop benchmarks supports continued development in this direction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Iterative Extraction and Verification (AFEV).