Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
90 tokens/sec
Gemini 2.5 Pro Premium
54 tokens/sec
GPT-5 Medium
19 tokens/sec
GPT-5 High Premium
18 tokens/sec
GPT-4o
104 tokens/sec
DeepSeek R1 via Azure Premium
78 tokens/sec
GPT OSS 120B via Groq Premium
475 tokens/sec
Kimi K2 via Groq Premium
225 tokens/sec
2000 character limit reached

Reason from Future: Reverse Thought Chain Enhances LLM Reasoning (2506.03673v1)

Published 4 Jun 2025 in cs.AI

Abstract: It has been demonstrated that carefully designed reasoning paradigms, like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), can enhance the reasoning capabilities of small LLMs by detailed thinking and extensive thought searching, unbounded branching factors in the searching space create prohibitive reasoning consumption. However these methods fall into the trap of local optimum reasoning, which means the model lacks a global perspective while solving problems. We propose a novel reasoning paradigm called Reason from Future (RFF), which generates reasoning paths by bidirectional reasoning that combines top-down planning with bottom-up reasoning accumulation. The essence of RFF lies in its reverse reasoning mechanism, which prioritizes core logical relationships and imposes goal-oriented constraints on intermediate steps, thereby reducing the searching space and mitigating error accumulation inherent in sequential forward reasoning. Empirical evaluations across diverse experiments demonstrate that RFF outperforms conventional paradigms with higher accuracy and less searching space to solve complex tasks.

Summary

  • The paper presents RFF, a novel bidirectional reasoning framework that alternates reverse planning with forward execution to reduce error propagation.
  • It details two strategies—RFF-T for search trees and RFF-G for DAGs—that significantly improve model accuracy and computational efficiency.
  • Experimental results show that RFF outperforms traditional methods with higher accuracy and robust performance against redundant or noisy inputs.

The paper "Reason from Future: Reverse Thought Chain Enhances LLM Reasoning" (2506.03673) introduces a novel reasoning paradigm called Reason from Future (RFF) designed to enhance the problem-solving capabilities of LLMs. It addresses limitations in existing methods like Chain-of-Thought (CoT) and Tree-of-Thought (ToT), which tend to focus on local steps and can lead to inefficient search or accumulation of errors due to a lack of a global perspective on the problem.

RFF proposes a bidirectional reasoning approach that combines top-down planning with bottom-up reasoning accumulation. The core idea is to guide the reasoning process by starting from the desired end state (the "1") and working backward to identify intermediate goals. This reverse thinking step is then followed by a forward reasoning step that aims to reach this newly defined intermediate goal from the current state. This alternating process of generating a "last step" goal and then taking a forward step towards it imposes goal-oriented constraints on intermediate steps, effectively reducing the search space and mitigating the error propagation common in purely sequential forward reasoning.

The RFF framework consists of three main components:

  1. Last Step Generator (G()G()): This component performs the backward reasoning step. Given the current state and the current target state, it generates a preceding state that is closer to the target, explicitly defining the transition needed between this preceding state and the current target. This generated preceding state becomes the new target for the subsequent forward step.
  2. Stepwise Forward Reason (R()R()): This component performs the forward reasoning step. Given the current state and the new target state (generated by G()G()), it generates the next state in the forward path towards the target. The paper describes two strategies for this:
    • RFF-T: Suitable for tasks modeled as search trees (e.g., Game of 24), where reaching a correct final state is the goal and wrong paths need to be avoided or backtracked from. This strategy incorporates avoiding previously failed attempts.
    • RFF-G: Suitable for tasks modeled as directed acyclic graphs (DAGs) (e.g., mathematical problems), where intermediate results are generally useful or harmless, and the reasoning path accumulates information from previous steps.
  3. State Check (C()C()): This component determines when the reasoning process terminates. It checks if the current state has reached the latest target state. Similar to the forward reasoner, there are two strategies:
    • RFF-T: Checks for coincidence between the current state and the target state, or if the target can be reached in one step. Includes a Verifier V()V() to confirm if a reached state is on a correct path and facilitate backtracking if necessary.
    • RFF-G: Checks if the information required by the target state has been solved or is already present, preventing overthinking.

The paper evaluates RFF on several benchmarks using Llama3-8B-Instruct and Qwen2.5-7B-Instruct as base models, comparing it against methods like CoT, ToT, AoT, Cumulative Reasoning (CR), Least-to-Most, and Give-me-Hint.

Key experimental results demonstrate RFF's practical advantages:

  • Game of 24: On this search-tree task, RFF-T achieves significantly higher accuracy than baselines (including strong models like GPT-4 using CR) while visiting substantially fewer states. This highlights RFF's efficiency in pruning the search space due to its goal-directed nature. For example, Llama3-8B with RFF (n=10) reached 96% accuracy with 15 visited states, compared to Llama3-8B with CR (n=5) at 19% accuracy with 89.8 states, or GPT-4 with CR (n=5) at 94% accuracy with 13.7 states.
  • Math Problems: On datasets like GSM8K, SVAMP, ASDiv, and MATH-500 (modeled as DAGs), RFF-G consistently outperforms baselines in accuracy. For instance, Llama3-8B-Instruct with RFF achieved an average accuracy of 75.4% across the four datasets, compared to 67.8% for CoT and 68.3% for CR. The gap between RFF and CoT was larger on weaker models, suggesting RFF is particularly beneficial when the base model's inherent reasoning ability is lower. RFF's State Checker helps prevent overthinking on simpler problems, where CR sometimes struggles.
  • Commonsense Problems: On CommonQA and LogiQA, RFF-G shows competitive or superior performance compared to baselines. On LogiQA, a harder benchmark, RFF and CR demonstrated significant improvements over CoT.
  • Studies on Redundant Thinking: Introducing redundant information (adding a "1" to Game of 24 puzzles) significantly degraded the performance of CR, while RFF maintained high accuracy with fewer visited states, showcasing its robustness to irrelevant information.
  • Studies on Robust Thinking: Evaluating on GSM-Symbolic variants (semantic-preserving transformations of GSM8K problems) showed that while accuracy dropped for both methods, RFF exhibited a more stable and higher-accuracy distribution across problem variants compared to CoT, indicating more robust reasoning.

The paper includes an appendix detailing the two RFF strategies and provides prompts used for the Last Step Generator and Stepwise Forward Reasoner for Game of 24 and math problems, illustrating how the paradigm is translated into practical prompting. An experiment in the appendix also supports the design choice of alternating bidirectional steps ("Pair Reasoning") over generating the full backward chain initially ("Single Reasoning"), demonstrating the importance of generating new information during the reasoning process.

In summary, RFF offers a practical approach to improve LLM reasoning by introducing a goal-oriented, bidirectional perspective that guides the forward search. This leads to increased accuracy and improved efficiency, particularly for complex problems and when using less capable base models, making it a promising paradigm for deploying LLMs in applications requiring systematic problem-solving. The main limitation acknowledged is the reliance on the base model's ability for accurate reverse thinking, suggesting 1^ work might explore fine-tuning or reinforcement learning to enhance this aspect.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com