Validate monotonic-concave rollout-size gains at large scale
Validate on large-scale language models whether the improvements from increasing the rollout size N in Broad Reinforcement Learning (BroRL) are monotonic and concave, as suggested by token-level simulations, and establish the empirical relationship between rollout size and performance at scale.
References
Our simulation results (Figure \ref{fig:simulation}) suggest that the gains are monotonic but concave, yet validating this trend on large-scale LLMs is a computationally demanding task that we leave for future work.
— BroRL: Scaling Reinforcement Learning via Broadened Exploration
(2510.01180 - Hu et al., 1 Oct 2025) in Appendix, Section: Limitations