Quantify Trait Amplification vs New Learning in RLMT Gains
Determine the relative contributions of (i) amplification of pre-existing traits in the base language model and (ii) acquisition of new traits during the supervised fine-tuning warm-start and during Reinforcement Learning with Model-rewarded Thinking (RLMT) to the observed performance improvements in RLMT-trained language models. Ascertain how much each stage—SFT warm-start and RLMT—contributes to these gains to inform the design of improved post-training pipelines.
References
While our work finds the effectiveness of training LMs with thinking, it is unclear how much of the improvement is due to amplification of traits already present in the model, versus the learning of new traits during the SFT warm-start or RL training.
— Language Models that Think, Chat Better
(2509.20357 - Bhaskar et al., 24 Sep 2025) in Limitations and future work, Conclusion section