Utility of Text-Only Chain-of-Thought for Human Verification

Determine whether text-only chain-of-thought explanations generated by large language models improve end users’ (students, teachers, and non-expert verifiers) ability to understand and verify mathematical reasoning.

Background

LLMs frequently produce chain-of-thought (CoT) explanations to show intermediate reasoning steps on math and reasoning tasks. Although CoT improves model performance on benchmarks, its presentation is typically long and text-heavy, which may impose cognitive load on end users and impede error detection.

The paper motivates interactive alternatives to CoT by noting that it is not established whether standard text-only CoT outputs actually help human users comprehend and verify mathematical reasoning, particularly in educational contexts where clarity and error checking are essential.

References

Despite these advances, it remains unclear whether text-only CoT explanations actually help end users—students, teachers, or non-expert verifiers—understand and check mathematical reasoning.

— Improving Human Verification of LLM Reasoning through Interactive Explanation Interfaces (2510.22922 - Zhou et al., 27 Oct 2025) in Section 1 Introduction

Utility of Text-Only Chain-of-Thought for Human Verification

Sponsor

Background

References

Related Problems