Papers
Topics
Authors
Recent
2000 character limit reached

Fact-Based Decomposition Methods

Updated 23 November 2025
  • Fact-based decomposition methods are algorithmic strategies that break complex assertions into minimal, self-contained atomic facts to facilitate efficient verification and reasoning.
  • They combine formal foundations, graph-based models, and reinforcement learning to optimize the granularity and accuracy of sub-claims.
  • These methods are applied in fact-checking, knowledge base completion, and domain-specific extraction such as clinical and temporal fact retrieval.

Fact-based decomposition methods encompass a spectrum of algorithmic and formal strategies for breaking down complex natural language claims, answers, or knowledge base queries into simpler, minimal units—commonly referred to as atomic facts or sub-claims—for the purposes of retrieval, verification, reasoning, and evaluation. This paradigm has become central in modern fact-checking, knowledge base completion, NLI-based factuality scoring, attributed QA, and specialized domains such as temporal or clinical fact extraction. Below, the principal frameworks, definitions, methodologies, error taxonomies, empirical evaluations, and design principles are described with reference to the latest research on arXiv.

1. Formal Foundations: Atomic Facts and Decomposition Functions

Fact-based decomposition centers on the principled partitioning of an input—be it claim, sentence, or knowledge base query—into atomic facts: units that are minimal (cannot be further decomposed), independent (self-contained for verification), and simple (include only what is strictly necessary for their assertion) (Schmidt et al., 1 Sep 2025). The mathematical formalism can be summarized:

  • Given text TT (claim, answer, or premise), a decomposition function f:T{f1,,fN}f : T \to \{f_1,\dots,f_N\} yields a set FF of atomic facts.
  • For formal datasets (e.g., knowledge bases), multi-facet decomposition is used: each candidate fact (h,r,t)(h, r, t) is scored by sub-models for head–relation, tail–relation, and tail–inference facets, aggregating evidence across modalities (Fu et al., 2019).

Atomicity is operationally assessed via the number of irreducible facts per subclaim or the log-dense "atomicity metric" $\text{atomicity}(c) = \log_2(\text{# facts in } c)$ (Lu et al., 19 Mar 2025). In procedural pipelines, atomic facts are produced either extractively (span mapping over input text (Popovič et al., 23 Sep 2025)) or abstractively (LLM-generated minimal sentences (Wanner et al., 18 Mar 2024, Yan et al., 22 Oct 2024)).

2. Graph, Programmatic, and Template-Guided Decomposition

Several frameworks formalize decomposition as graph or program construction:

  • Graph-based Decomposition (GraphFC): Claims and evidence are encoded as triplet graphs G=(V,E)G = (V, E), where each edge reflects an atomic \langlesubject, predicate, object\rangle relationship. Verification is performed via guided planning over the graph, incrementally resolving unknowns and enforcing cross-fact relational constraints, yielding SOTA macro-F1 in multi-hop fact-checking (Huang et al., 10 Mar 2025).
  • Program-guided Decomposition: Complex statements over tables or structured data are parsed into executable programs; their parse tree guides the splitting into sub-queries reflecting conjunction, comparative, superlative, and uniqueness operators (Yang et al., 2021).
  • Template-based Question Decomposition: Systems such as QuestGen fine-tune Seq2Seq models on mixed human-synthetic datasets to generate decomposition questions, functioning as explicit intermediate evidence for retrieval + NLI pipelines; manual evaluation confirms machine-generated questions can surpass human-written queries in exhaustiveness and downstream support (Setty et al., 31 Jul 2024).

3. Iterative and Adaptive Decomposition Frameworks

Recent advances prioritize dynamic, feedback-driven decomposition over static strategies:

  • Iterative Extraction and Verification (AFEV): Claims are decomposed incrementally, with each atomic fact verified via retrieval and context-specific demonstrations; extraction is conditioned on prior labels and rationales, allowing adaptive granularity and error correction. Each iteration involves extraction, retrieval, reranking, verification, and aggregation, outperforming static methods across accuracy and interpretability metrics (Zheng et al., 9 Jun 2025).
  • Reinforcement Learning for Optimal Atomicity: The dynamic decomposition framework formalizes policy optimization for downstream verification accuracy as a bilevel NP-hard objective, solved via RL (PPO) using feedback from verifiers to adjust atomicity levels of subclaims, achieving statistically significant improvements in both accuracy and confidence (Lu et al., 19 Mar 2025).
  • Claim Segmentation–Decontextualization–Editing (SUCEA): Adversarial claims are segmented and decontextualized to yield self-contained, retriever-friendly subclaims. Iterative retrieval and editing (with constraints to avoid hallucination) further improve robustness and label accuracy, requiring no data-specific model training (Liu et al., 5 Jun 2025).

4. Error Taxonomy and Quality Metrics

Fact-based decomposition is subject to a range of error modes (Hu et al., 17 Oct 2024, Wanner et al., 18 Mar 2024):

Error Type Manifestation Impact on Verification
Omission Missing core claims/logical relationships Reduced coverage
Ambiguity Unresolved pronouns/vague expressions Invalid or ungrounded evidence
Over-decomposition Redundant or excessive micro-claims Noisy retrieval/aggregation
Alteration Addition/negation of false facts Spurious support/refutation

Decomposition-induced noise (EdE_d) and retrieval error (ErE_r) accumulate multiplicatively in overall system accuracy: Adec=A(kd)(1Ed)(1Er)A_{\rm dec} = A(k_d)(1 - E_d)(1 - E_r) (Hu et al., 17 Oct 2024). Consequently, robust verification depends equally on optimal fact granularity and minimal error propagation.

Quality is assessed using metrics such as DecompScore (atomicity and coverage of subclaims), FActScore (fraction of supported subclaims), precision/recall of extracted facts, inter-annotator agreement (IAA), and bespoke attribution metrics for question answering (AttrpAttr_p, AttrrAttr_r) (Wanner et al., 18 Mar 2024, Yan et al., 22 Oct 2024).

5. Practical Architectures and Empirical Evaluations

  • Extractive Encoders (JEDI): Atomic fact decomposition and span-level NLI classification are performed in one encoder forward pass, eschewing generative LLMs at inference time. Joint optimization of span extraction and classification achieves competitive accuracy on ANLI, superior adversarial robustness on HANS, and full traceability of reasoning (Popovič et al., 23 Sep 2025).
  • Hybrid Large/Small-LM Pipelines (TSDRE): Timeline-based sentence decomposition via in-context prompts to an LLM splits events by temporal anchor, feeding atomic event sentences to a compact PLM for structured fact extraction; SOTA exact-match F1_1 scores on HyperRED-Temporal and ComplexTRED accentuate gains from granular decomposition (Chen et al., 16 May 2024).
  • Attributed QA via Decomposition (ARE): Long-form answers are recursively partitioned into molecular clauses and atomic facts, with each atomic fact verified, edited, or expanded via external evidence; evidence attribution precision and preservation are tracked via tailored evaluation metrics (AttrpAttr_p, PresPres), with empirical gains substantiated across multiple QA datasets (Yan et al., 22 Oct 2024).
  • Numerical Fact-Checking Benchmarks (QuanTemp++): Multiple decomposition paradigms for claim-to-query generation (oracle, LLM few-shot, programmatic, distilled encoder-decoder) are benchmarked, with query quality (completeness, redundancy) and retrieval performance directly mapped to fact-verification macro-F1. Learned decomposers (QGen) nearly match upper-bound retrieval while being tractable for real-time deployment (Venktesh et al., 24 Oct 2025).

6. Domain Adaptations and Visualization Tools

Decomposition methods extend to domain-specific settings:

  • Clinical Fact Decomposition (FactEHR): LLMs decompose clinical notes into atomic claims. Inter-model variation in facts-per-sentence ratios (up to 2.6×\times), entailment-based precision/recall, and qualitative coverage errors highlight LLM limitations. Coverage–atomicity tradeoffs and recommendation for retrieval-augmented prompts are emphasized (Munnangi et al., 17 Dec 2024).
  • Temporal Fact Extraction: Timeline-based decomposition with temporal anchoring exposes previously unattainable fact–time correspondences, facilitating more accurate knowledge graph construction (Chen et al., 16 May 2024).
  • Visual Analytics for Annotation (Dissecting Atomic Facts): Semantic similarity heatmaps, fact-count histograms, and knowledge graph projections allow for iterative guideline refinement, resolution of inter-annotator disagreement, and more consistent atomic fact annotation (Schmidt et al., 1 Sep 2025).

7. Best Practices, Limitations, and Future Directions

Optimal decomposition is a function of input complexity, downstream verifier strength, and desired tradeoff between atomicity and coverage. Adaptive decomposition (e.g., dynamic setting of number of subclaims based on input length) and error reflection via meta-prompts yield the best empirical outcomes (Hu et al., 17 Oct 2024).

Limitations include computational expense of RL-based decomposers, verifier-policy dependence, and omission/hallucination errors in LLM outputs (Lu et al., 19 Mar 2025, Munnangi et al., 17 Dec 2024). Future research is directed toward joint multi-objective optimization (balancing readability, logic, and faithfulness), domain-adaptive prompting, open-source LLMs for decomposition, and unified end-to-end training of decomposition plus verification modules (Zheng et al., 9 Jun 2025, Chen et al., 16 May 2024, Venktesh et al., 24 Oct 2025).

Fact-based decomposition methods collectively define the state of the art in interpretable, robust, and modular fact verification across diverse application domains. Their continued refinement, coupled with rigorous empirical evaluation and domain tailoring, is pivotal for trustworthy NLP systems.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Fact-Based Decomposition Methods.