Superiority of Action Granularity Strategies in Marco-o1-MCTS
Determine which Monte Carlo Tree Search action granularity—step-level actions versus fixed-length mini-step actions of 32 or 64 tokens—yields superior performance within the Marco-o1-MCTS framework on the MGSM-English and MGSM-Chinese benchmarks when using the confidence-score-based reward computed from token log probabilities.
Sponsor
References
Currently, as shown in Figures~\ref{fig:cot-step}, \ref{fig:step-ministep32}, and \ref{fig:ministep64-step}, we cannot draw definitive conclusions about which action strategy is superior.
— Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
(2411.14405 - Zhao et al., 21 Nov 2024) in Subsection “Main Results,” Section “Experiments” (also reiterated in the caption of Figure ‘ministep64-step’; label fig:ministep64-step)