Cause of pass@k gap narrowing in RL post-training
Ascertain whether the observed narrowing of pass@k performance gaps between reinforcement-learning–post-trained large language models and their base models is explained, at least in part, by evaluating and conducting RL training on tasks for which base models already achieve high pass@k due to pretraining, thereby leaving RL with limited incentive to teach genuinely new skills.
References
We conjecture that this observation arises, at least in part, from evaluating and RL training on tasks where base models already achieve high pass@$k$, possibly due to pretraining on similar tasks that is beyond the control of most academic researchers; thus RL has little incentive to learn a skill that the base model already has.
— From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
(2509.25123 - Yuan et al., 29 Sep 2025) in Section 1, Introduction