Dice Question Streamline Icon: https://streamlinehq.com

Validate monotonic-concave rollout-size gains at large scale

Validate on large-scale language models whether the improvements from increasing the rollout size N in Broad Reinforcement Learning (BroRL) are monotonic and concave, as suggested by token-level simulations, and establish the empirical relationship between rollout size and performance at scale.

Information Square Streamline Icon: https://streamlinehq.com

Background

Theoretical analysis and simulations in the paper indicate that increasing the rollout size N reduces detrimental unsampled coupling effects and suggests a monotonic but concave gain pattern. However, this behavior has not been confirmed on large-scale models due to computational cost.

The authors note that validating the trend on real LLMs, potentially across intermediate N values, is left for future work, making the large-scale empirical characterization of the N–performance curve an outstanding task.

References

Our simulation results (Figure \ref{fig:simulation}) suggest that the gains are monotonic but concave, yet validating this trend on large-scale LLMs is a computationally demanding task that we leave for future work.

BroRL: Scaling Reinforcement Learning via Broadened Exploration (2510.01180 - Hu et al., 1 Oct 2025) in Appendix, Section: Limitations