Superiority of Action Granularity Strategies in Marco-o1-MCTS

Determine which Monte Carlo Tree Search action granularity—step-level actions versus fixed-length mini-step actions of 32 or 64 tokens—yields superior performance within the Marco-o1-MCTS framework on the MGSM-English and MGSM-Chinese benchmarks when using the confidence-score-based reward computed from token log probabilities.

Background

Marco-o1 integrates LLMs with Monte Carlo Tree Search (MCTS), where actions correspond to generated reasoning steps. To improve search granularity, the authors compare treating an entire reasoning step as a single action versus splitting steps into mini-steps of 32 or 64 tokens within the MCTS framework.

Using a confidence-score-based reward (averaged softmax-normalized token log probabilities against the top alternatives), experiments on MGSM-English and MGSM-Chinese show differing best-performing strategies (step-level for English; 32-token mini-step for Chinese) and significant randomness in tree search outcomes. As a result, the authors explicitly state they cannot draw definitive conclusions about which action strategy is superior, leaving this comparison unresolved under the current reward scheme.

References

Currently, as shown in Figures~\ref{fig:cot-step}, \ref{fig:step-ministep32}, and \ref{fig:ministep64-step}, we cannot draw definitive conclusions about which action strategy is superior.

Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions (2411.14405 - Zhao et al., 21 Nov 2024) in Subsection “Main Results,” Section “Experiments” (also reiterated in the caption of Figure ‘ministep64-step’; label fig:ministep64-step)