Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment (2502.07803v1)

Published 5 Feb 2025 in cs.AI and cs.LG

Abstract: Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of LLMs by generating natural language (NL) rationales that lead to the final answer. However, it struggles with numerical computation, which has somehow led to the development of program-aided techniques. Despite their potential, a persistent challenge remains: inconsistencies between LLM-reported reasoning steps and the logic in generated programs, which we term ``reasoning hallucinations." This stems from the inherent ambiguities of NL and the statistical nature of LLMs, which often lack rigorous logical coherence. To address this challenge, we propose a novel test-time scaling framework, Reasoning-as-Logic-Units (RaLU), which constructs a more reliable reasoning path by aligning logical units between the generated program and their corresponding NL descriptions. By decomposing the initially generated program into discrete units using static analysis, RaLU engages in an iterative dialogue with the LLM to judge, refine, and explain each unit. A rewind-and-correct mechanism ensures alignment between code statements and task requirements in each unit, ultimately forming a cohesive reasoning path under the program's logic, from which the model reaches a final solution. Our experiments demonstrate that RaLU significantly outperforms existing baselines in mathematical reasoning (GSM8K, MATH) and algorithmic reasoning (HumanEval+, MBPP+), underscoring its potential to advance LLM reasoning and programming by offering enhanced accuracy and interpretability.

Summary

  • The paper presents the RaLU framework, aligning LLM natural language reasoning with program logic units to combat 'reasoning hallucinations'.
  • Experimental results show RaLU significantly improves accuracy on mathematical and algorithmic reasoning benchmarks, outperforming baselines by up to 6.60%.
  • By aligning NL and code logic, RaLU enhances LLM reasoning interpretability, reliability, and consistency, advancing capabilities for complex problem-solving.

The paper "Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in LLMs Through Logic Unit Alignment" introduces a novel framework, Reasoning-as-Logic-Units (RaLU), aimed at enhancing the reasoning capabilities of LLMs by addressing inconsistencies termed as "reasoning hallucinations." These inconsistencies arise from discrepancies between the natural language (NL) reasoning steps generated by LLMs and the logical steps in program-generated solutions.

Key Contributions:

  1. Reasoning Hallucinations: The paper identifies the issue of "reasoning hallucinations" in LLMs, where the statistical nature of these models leads to NL descriptions that are not directly aligned with the logical structures of generated programs. These are attributed to ambiguities in NL and the model's tendency to predict based on token frequencies rather than logical coherence.
  2. RaLU Framework: RaLU is proposed as a solution to mitigate reasoning hallucinations by constructing a reliable reasoning path. It does so by decomposing the initial program into discrete logical units via static analysis, then engaging in iterative dialogues with the LLM to assess, refine, and align these units with NL descriptions.
  3. Logic Unit Alignment: RaLU breaks down the program into logical units using static analysis tools such as control flow graphs (CFG). This allows RaLU to align these units with the LLM's NL rationales through a series of interactions, employing a rewind-and-correct mechanism to ensure cohesive alignment with task requirements.
  4. Experimental Validation: The framework was tested on several benchmarks, including GSM8K and MATH for mathematical reasoning, and HumanEval+, MBPP+ for algorithmic reasoning. RaLU demonstrated substantial improvements in accuracy, outperforming existing baselines by significant margins (e.g., up to 6.60% in specific benchmarks).
  5. Theoretical Justification: The paper provides a theoretical justification for RaLU's effectiveness, demonstrating through Bayesian inference that the alignment of NL and programmatic representations enhances the reliability of the reasoning process, thus reducing reasoning hallucinations.
  6. Broader Implications: RaLU's structured hybrid reasoning process, achieved by enforcing alignment between NL and code logic, presents a significant advancement in developing more interpretable, reliable, and consistent reasoning in LLMs, contributing to their application in complex problem-solving.

The RaLU framework systematically enhances the coherence and reliability of LLM reasoning processes by leveraging both NL intuition and formal programmatic logic, addressing intrinsic limitations inherent in traditional CoT and PoT prompting techniques. This structured approach promises improved performance in logical tasks and greater interpretability of LLMs' reasoning abilities.

X Twitter Logo Streamline Icon: https://streamlinehq.com