Behavior of the KL–forgetting link at frontier scale and across domains
Characterize the behavior of the empirical relationship between forward KL divergence on the new task and catastrophic forgetting at frontier-scale models and in diverse generative domains, and determine whether this relationship persists or changes in these regimes.
References
Moreover, while we demonstrate the KL–forgetting link across moderate-scale LLMs and toy models, its behavior at frontier scales and in more diverse generative domains remains unknown.
— RL's Razor: Why Online Reinforcement Learning Forgets Less
(2509.04259 - Shenfeld et al., 4 Sep 2025) in Discussion and Conclusion