Effective scaling of RLVR
Determine effective methods and principles for scaling Reinforcement Learning with Verifiable Rewards (RLVR) to improve the reasoning capabilities of large language models, identifying which scaling axes and training designs yield reliable performance gains.
References
Yet, how to effectively scale the RLVR paradigm remains an open question.
— BroRL: Scaling Reinforcement Learning via Broadened Exploration
(2510.01180 - Hu et al., 1 Oct 2025) in Section 1 (Introduction)