Identify sources of performance gains in RL-based post-training
Ascertain the primary contributors to performance improvements observed in large language models following reinforcement learning-based post-training methods such as GRPO and PPO by isolating and quantifying the effects of pretraining, RL fine-tuning procedures, stochastic training dynamics (including random seeds and data ordering), and intrinsic architectural strength, so that experimental confounds are removed and causal attributions become reliable.
Sponsor
References
While effective, these methods introduce new experimental confounds—it becomes unclear whether performance gains stem from pretraining, RL fine-tuning, stochastic training dynamics, or architectural strength.
— Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
(2512.17351 - Allen-Zhu, 19 Dec 2025) in Section 1 (Introduction), Challenge 3: Grokking, Data Quality, and Curriculum Learning