Atomic Fact Extraction and Verification Framework

Updated 1 December 2025

Atomic Fact Extraction and Verification is a framework that breaks complex, multi-hop claims into atomic, verifiable facts using iterative extraction and feedback-driven refinement.
It employs a two-stage evidence retrieval process with coarse bi-encoder filtering and cross-encoder reranking for precise evidence selection.
The adaptive system aggregates atomic fact verifications with transparent, interpretable rationales to enhance overall claim accuracy.

Atomic Fact Extraction and Verification (AFEV) is a framework for decomposing and verifying complex natural-language claims through the iterative extraction of atomic facts, fine-grained evidence retrieval, and context-sensitive reasoning. By breaking multi-hop claims into minimal, context-aware sub-facts, AFEV mitigates error propagation, enhances interpretability, and demonstrates improved accuracy over traditional one-shot or static decomposition approaches, particularly in scenarios requiring multi-hop inference or fragmented evidence (Zheng et al., 9 Jun 2025).

1. Core Principles and Motivation

Atomic Fact Extraction and Verification addresses core challenges in automated fact verification:

Traditional single-step methods frequently struggle with multi-hop claims, often accumulating errors by attempting to verify the entire claim holistically, leading to noisy evidence retrieval and incoherent sub-fact decomposition.
Static decomposition is brittle: rule-based or one-shot LLM decompositions often fail to utilize partial verification feedback and can redundantly or incoherently split claims.
Shallow, surface-level semantic retrieval exacerbates evidence contamination, diluting crucial information necessary for precise, multi-hop reasoning.
Rigid, prompt-based LLM reasoning frameworks lack flexibility for claims with diverse logical requirements (Zheng et al., 9 Jun 2025).

AFEV’s feedback-driven, iterative decomposition approach dynamically refines claim understanding and drives adaptive, context-specific reasoning and evidence retrieval for each atomic fact, offering substantial advantages over prior static pipelines.

2. Iterative Atomic Fact Extraction

In AFEV, complex claims $C$ are decomposed into atomic facts $\{F_1, F_2, \dots, F_T\}$ through an iterative, LLM-based extraction loop:

At each iteration $t$ , the Extractor decides if previously extracted facts $\{F_1, \dots, F_{t-1}\}$ suffice for full claim coverage; if not, a new atomic fact $F_t$ is generated, conditioned on all prior sub-facts, their predicted labels $\{y_1, \dots, y_{t-1}\}$ , and explicit rationales $\{r_1, \dots, r_{t-1}\}$ .
The extraction process proceeds until the LLM returns a “STOP” signal. This allows the framework to adaptively control fact granularity based on verifiability feedback (Zheng et al., 9 Jun 2025).

Pseudocode representation:

Input: claim C
Initialize F ← [ ], y ← [ ], r ← [ ], t ← 1
repeat
  prompt ← build_prompt(C, F, y, r)
  F_t ← LLM_Extractor(prompt)
  if F_t == STOP:
    break
  y_t, r_t ← previously empty
  append F_t to F
  t ← t + 1
until false
Output: list of atomic facts F

This loop dynamically covers all necessary fact fragments, ensuring maximal verifiability and minimal semantic redundancy.

3. Two-Stage, Fine-Grained Evidence Retrieval and Reranking

Each atomic fact $F_t$ is paired with a targeted, refinement-based evidence retrieval procedure:

Coarse retrieval: A bi-encoder computes the cosine similarity between embedded atomic fact $F_t$ and corpus candidates, recalling the top $k'$ candidates.
Reranking: A cross-encoder reranker, trained with an InfoNCE loss over LLM-annotated positive/negative evidence (with temperature parameter $\tau$ ), applies a precision-oriented scoring to return the $k$ most pertinent evidence snippets per atomic fact.

This cascade notably abates evidence noise and contradictory distractions, especially critical in multi-hop settings where spurious sentences can propagate downstream errors. In practice, $k'=5$ followed by reranking to $k=2$ is empirically optimal (Zheng et al., 9 Jun 2025).

4. Adaptive Verification with In-Context Demonstrations

To ensure high precision in fact verification, AFEV deploys context-specific, dynamically retrieved in-context demonstrations for each atomic sub-task:

A demonstration pool $\mathcal{C}$ of labeled atomic facts and supporting evidence snippets is maintained.
For given $F_t$ , the system retrieves the most structurally and semantically similar demonstration(s) through embedding similarity.
The LLM-based reasoner is then prompted with the full context: original claim $C$ , atomic fact $F_t$ , reranked evidence $E_t$ , and top demonstration(s) $A_t$ .

This mechanism allows the model to adapt reasoning strategy per sub-fact, reducing hallucinations and promoting accountable, interpretable rationales (Zheng et al., 9 Jun 2025).

5. Global Aggregation and Decision Rules

For each atomic fact, the reasoner outputs a label $y_t \in \{\text{True}, \text{False}, \text{Unverifiable}\}$ and a fine-grained natural-language rationale $r_t$ :

The global claim verdict $y^*$ is aggregated by a simple rule: if any atomic $y_t$ is “False,” then $y^* = \text{False}$ ; if all $y_t$ are “True,” then $y^* = \text{True}$ ; else “Unverifiable.”
This aggregation protocol ensures that any local error is traceable—atomic fact-level rationales provide full transparency.

Computationally, AFEV is efficient: each step is limited by the brevity of atomic facts and evidence, bi-encoder retrieval scales logarithmically with corpus size, and cross-encoder reranking is controlled by small $k'$ , enabling practical application at scale (Zheng et al., 9 Jun 2025).

6. Empirical Validation and Benchmarks

AFEV sets new state-of-the-art results in accuracy and interpretability across diverse fact verification benchmarks:

Dataset	Baseline (LA/Macro-F1)	AFEV (LA/Macro-F1)	Metric/Notes
LIAR-PLUS	VMASK (82.62/81.46)	83.73/83.12	+1.11 LA over SOTA
HOVER	CURE (76.98/76.89)	78.87/78.76	+2 LA, complex multi-hop
PolitiHop	VMASK (72.34/55.80)	74.14/57.69	Multi-hop, political claims
RAWFC	RAFTS (62.8/52.6/57.3)	63.3/57.6/60.2	F1, scientific (open) claims
LIAR	RAFTS (47.1/37.9/42.0)	48.2/40.3/43.9	F1, weakly structured

(Zheng et al., 9 Jun 2025)

Every atomic fact is verifiable by explicit rationale, and Figure 1 in (Zheng et al., 9 Jun 2025) demonstrates that iterative decomposition and adaptive reasoning produce more reliable fact chains compared to baseline LLM prompting and static decomposition pipelines.

7. Interpretability, Practical Impact, and Extensions

AFEV’s chain of atomic fact extractions and their per-fact rationales confer full transparency over the verification process, supporting robust human-in-the-loop analysis and post-hoc auditing.

This partitioned, context- and feedback-driven design is extensible to domains requiring structured, multi-modal, or scientific reasoning, as with table claim verification and skill-chaining in scientific contexts (Zhang et al., 8 Jun 2025), as well as in languages and settings represented by datasets like CFEVER (Lin et al., 20 Feb 2024).

Open challenges include scaling atomic decomposition to ambiguous or ill-formed claims, optimizing dynamic demonstration selection for low-resource domains, and integrating symbolic reasoning or external knowledge graphs to extend beyond surface-level evidence retrieval.

In summary, Atomic Fact Extraction and Verification provides a principled, modular, and empirically validated solution to the fragmentation, retrieval, and verification challenges that have historically limited the accuracy and interpretability of automated fact-checking in multi-hop, ambiguous, or highly contextual settings (Zheng et al., 9 Jun 2025).