Detecting errors at the sub-claim level in LLM reasoning chains

Develop a method to identify and certify errors within sub-claim components of large language model–generated reasoning chains, eliminating the assumption that claims are pre-decomposed into atomic units and enabling fine-grained soundness assessment while addressing the associated computational costs.

Background

Autoregressive Reasoning Entailment Stability (ARES) evaluates the soundness of each step in LLM-generated reasoning chains by conditioning only on previously verified claims. The approach assumes that the reasoning chain is already decomposed into discrete claims and does not operate below that granularity.

This assumption limits the framework’s ability to detect errors that occur within parts of a claim (e.g., within a sentence or expression that contains multiple assertions). Extending ARES to sub-claim granularity would allow more fine-grained detection of ungrounded or invalid reasoning but would likely increase computational cost due to a larger search space and more entailment checks.

References

Additionally, our approach assumes that the claims are already decomposed and, therefore, cannot detect errors at the sub-claim level. We leave this for future work, noting it would increase computational costs.

— Probabilistic Soundness Guarantees in LLM Reasoning Chains (2507.12948 - You et al., 17 Jul 2025) in Limitations section

Detecting errors at the sub-claim level in LLM reasoning chains

Background

References

Related Problems