Causal factors behind performance gains in RL-based reasoning models
Determine whether the performance enhancements observed in reinforcement learning–trained Large Reasoning Models (LRMs) are primarily caused by (i) increased exposure to established mathematical benchmark data during training, (ii) the greater inference-time compute allocated to thinking tokens, or (iii) genuine reasoning capabilities developed through reinforcement learning, by isolating and quantifying the contribution of each factor under controlled experimental conditions.
References
Currently, it is not clear whether the performance enhancements observed in recent RL-based reasoning (thinking) models are attributable to increased exposure to established mathematical benchmark data, to the significantly greater inference compute allocated to thinking tokens, or to reasoning capabilities developed by RL-based training?