Choosing between MCTS and expansive tree-based sampling in MALT
Determine whether Monte Carlo Tree Search (MCTS) or expansive tree-based sampling should be used as the trajectory generation method within the MALT multi-agent LLM training pipeline, and ascertain the conditions under which each approach provides superior effectiveness and efficiency for synthetic data generation and search in the generator–verifier–refiner sequence.
Sponsor
References
We also leave the choice between MCTS and an expansive tree-based sampling strategy as an open problem.
— MALT: Improving Reasoning with Multi-Agent LLM Training
(2412.01928 - Motwani et al., 2 Dec 2024) in Section 6 (Discussion)