Interpretable Contrastive Monte Carlo Tree Search Reasoning (2410.01707v2)

Published 2 Oct 2024 in cs.CL and cs.AI

Abstract: We propose SC-MCTS*: a novel Monte Carlo Tree Search (MCTS) reasoning algorithm for LLMs, significantly improves both reasoning accuracy and speed. Our motivation comes from: 1. Previous MCTS LLM reasoning works often overlooked its biggest drawback--slower speed compared to CoT; 2. Previous research mainly used MCTS as a tool for LLM reasoning on various tasks with limited quantitative analysis or ablation studies of its components from reasoning interpretability perspective. 3. The reward model is the most crucial component in MCTS, however previous work has rarely conducted in-depth study or improvement of MCTS's reward models. Thus, we conducted extensive ablation studies and quantitative analysis on components of MCTS, revealing the impact of each component on the MCTS reasoning performance of LLMs. Building on this, (i) we designed a highly interpretable reward model based on the principle of contrastive decoding and (ii) achieved an average speed improvement of 51.9% per node using speculative decoding. Additionally, (iii) we improved UCT node selection strategy and backpropagation used in previous works, resulting in significant performance improvement. We outperformed o1-mini by an average of 17.4% on the Blocksworld multi-step reasoning dataset using Llama-3.1-70B with SC-MCTS*. Our code is available at \url{https://github.com/zitian-gao/SC-MCTS}.

PDF HTML Abstract

Interpretable Contrastive Monte Carlo Tree Search Reasoning

The paper "Interpretable Contrastive Monte Carlo Tree Search Reasoning" introduces SC-MCTS $^*$ , a novel Monte Carlo Tree Search (MCTS) reasoning algorithm designed to significantly enhance both the accuracy and speed of reasoning processes within LLMs. The authors aim to address several notable challenges observed in previous MCTS applications, including slower speeds compared to the Chain of Thought (CoT) method, limited component analysis, and insufficient development of reward models central to MCTS. By implementing SC-MCTS $^*$ , the authors claim notable improvements over existing models, particularly verified through their performance on the Blocksworld multi-step reasoning dataset.

Key Contributions

1. Reward Model Development:

The authors emphasize the significance of the reward model in MCTS by conducting thorough ablation studies, revealing the impact of each component on MCTS reasoning performance. They introduce a highly interpretable reward model built on contrastive decoding principles, which demonstrates robustness in guiding the reasoning processes of LLMs. This approach involves calculating the Jensen-Shannon divergence between expert and amateur models' outputs, creating a reward mechanism grounded at the action-level rather than token-level.

2. Speed Enhancement via Speculative Decoding:

The authors address the speed limitations by incorporating speculative decoding into MCTS. This innovative implementation yields an average speed improvement of 51.9% per node. Speculative decoding accelerates inference by leveraging smaller LLMs to generate multiple possible continuations, which are verified against a larger model, enhancing processing efficiency without retraining requirements.

3. Improvements in Node Selection and Backpropagation:

The authors refine the UCT (Upper Confidence Bound on Trees) node selection strategy and backpropagation processes to further augment MCTS performance. They provide evidence that traditional UCT tuning may be suboptimal for LLM tasks, suggesting an adaptation of exploration constants tailored to reward model values derived from extensive empirical data. Additionally, they introduce a novel backpropagation strategy to prefer solution paths that demonstrate steady progressive gains.

Numerical Results and Evaluation

The SC-MCTS $^*$ algorithm showcased superior performance, outperforming the OpenAI o1-mini model by an average of 17.4% on the Blocksworld multi-step reasoning dataset using the Llama-3.1-70B model. Their experiments compare SC-MCTS $^*$ with other methods, validating that each component of SC-MCTS $^*$ contributes significantly to performance enhancements. Notably, the hierarchical integration of diverse reward models and mode normalization techniques has resulted in a more effective decision-making framework.

Implications and Future Directions

This work has several implications for the development of sophisticated reasoning algorithms in LLMs. The improvements in reasoning speed and interpretability of SC-MCTS $^*$ hold potential for broader applications across complex task domains where timely and accurate multi-step reasoning is critical. The authors suggest that future research could further generalize MCTS multi-step reasoning capabilities by exploring more dynamic step expansion techniques, potentially adapting their approach to other complex datasets beyond Blocksworld. Moreover, integrating these advancements with compositional reasoning frameworks represents promising territory for further enhancing the utility and applicability of MCTS in AI systems.

In summary, the paper offers a significant advancement in the field of MCTS within LLM reasoning, setting a foundation for future developments aimed at fully exploiting MCTS's potential. Through meticulous methodological refinement, SC-MCTS $^*$ addresses existing challenges and establishes a robust framework for interpretable and efficient reasoning in AI systems.