A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition (2510.22055v1)

Published 24 Oct 2025 in cs.IR and cs.CL

Abstract: Fact-checking numerical claims is critical as the presence of numbers provide mirage of veracity despite being fake potentially causing catastrophic impacts on society. The prior works in automatic fact verification do not primarily focus on natural numerical claims. A typical human fact-checker first retrieves relevant evidence addressing the different numerical aspects of the claim and then reasons about them to predict the veracity of the claim. Hence, the search process of a human fact-checker is a crucial skill that forms the foundation of the verification process. Emulating a real-world setting is essential to aid in the development of automated methods that encompass such skills. However, existing benchmarks employ heuristic claim decomposition approaches augmented with weakly supervised web search to collect evidences for verifying claims. This sometimes results in less relevant evidences and noisy sources with temporal leakage rendering a less realistic retrieval setting for claim verification. Hence, we introduce QuanTemp++: a dataset consisting of natural numerical claims, an open domain corpus, with the corresponding relevant evidence for each claim. The evidences are collected through a claim decomposition process approximately emulating the approach of human fact-checker and veracity labels ensuring there is no temporal leakage. Given this dataset, we also characterize the retrieval performance of key claim decomposition paradigms. Finally, we observe their effect on the outcome of the verification pipeline and draw insights. The code for data pipeline along with link to data can be found at https://github.com/VenkteshV/QuanTemp_Plus

Summary

The paper introduces QuanTemp++, a dataset that addresses temporal leakage and enhances evidence quality for numerical claim verification.
The study proposes the FCDecomp method, which decomposes claims into targeted queries to mimic human reasoning in evidence retrieval.
The paper demonstrates significant improvements in verification performance, setting a new benchmark for automated fact-checking systems.

A Benchmark for Open-Domain Numerical Fact-Checking Enhanced by Claim Decomposition

Introduction

The verification of numerical claims in open-domain settings is crucial due to the inherent illusion of veracity that numbers often convey. This sense of truth can have significant societal impacts, affecting decision-making and public perception. Previous automated fact verification efforts have inadequately addressed the challenges posed by numerical claims, often suffering from issues such as temporal leakage and reliance on less relevant or noisy evidences. To address these concerns, the paper introduces QuanTemp++, a dataset specifically designed for numerical fact-checking with enhanced methods for evidence retrieval and verification. This dataset offers a more human-like approach to claim verification by employing claim decomposition techniques aimed at emulating human fact-checkers.

Dataset Construction and Methodology

QuanTemp++ introduces a systematic approach for collecting and organizing evidence for numerical claims. The dataset consists of around 15,000 natural numerical claims paired with an open-domain corpus and corresponding evidence, created with a focus on minimizing temporal leakage.

Figure 1: Data creation pipeline of QuanTemp++

The process involves the use of a novel query decomposition method called FCDecomp, which generates search queries to extract relevant evidence by mimicking the reasoning approach of human fact-checkers. This method improves upon previous datasets by collecting high-quality evidence that aligns with the numerical aspects of the claims while avoiding future-published evidence. FCDecomp is supported by machine learning techniques to filter and validate the quality of gathered evidence, ensuring it accurately supports or refutes the claims.

Claim Decomposition and Retrieval

The crux of the paper's methodology is the FCDecomp technique, which innovatively decomposes claims into sub-components that can guide the search for evidence. In contrast to using the claim itself as the sole search query, FCDecomp employs a structured process informed by claim justifications to generate diverse and targeted queries. These queries cover implicit and explicit aspects, increasing the likelihood of retrieving relevant information. The authors employ a combination of weak supervision and learning models to further refine the query generation and evidence collection, enhancing the robustness and realism of the fact-checking process.

The research questions addressed include the impact of FCDecomp on retrieving quality evidence, the comparative performance of existing claim decomposition methods, and their effect on the downstream task of claim verification. The results suggest that FCDecomp significantly improves retrieval quality, showing promise for application in realistic verification scenarios.

Implications and Future Directions

The introduction of QuanTemp++ represents a pivotal advancement in the field of automated fact-checking, particularly for numerical claims. The dataset's realistic construction and its focus on minimizing common pitfalls such as temporal leakage provide a robust testbed for developing more effective fact-checking methods. The use of claim decomposition to simulate human search processes highlights potential improvements in machine learning approaches, suggesting paths for future research.

Future developments could explore iterative decomposition and retrieval processes to further align machine-based verification with human reasoning. Additionally, enhancing models' numerical reasoning capabilities remains an open challenge that could significantly boost performance in fact-checking contexts.

Conclusion

QuanTemp++ offers a comprehensive and realistic dataset for the challenging task of open-domain numerical fact-checking, addressing key limitations in existing benchmarks. By leveraging advanced claim decomposition techniques and focusing on realistic evidence retrieval, the dataset sets a new standard for evaluating automated fact-checking systems. This work not only advances the current methodologies but also provides a foundation for future innovations in improving the fidelity of numerical claim verification.