Employing MCTS During RLVR Training
Determine effective methodologies for employing Monte Carlo Tree Search (MCTS) during Reinforcement Learning with Verifiable Rewards (RLVR) training, rather than restricting MCTS to inference-only use, so that systematic exploration can be integrated directly into the training process.
References
Despite the demonstrated potential of MCTS for heuristic exploration, it remains unclear how to effectively employ it during RLVR training.
— DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
(2509.25454 - Wu et al., 29 Sep 2025) in Appendix, Related Works, Monte-Carlo Tree Search paragraph