Behavior of the KL–forgetting link at frontier scale and across domains

Characterize the behavior of the empirical relationship between forward KL divergence on the new task and catastrophic forgetting at frontier-scale models and in diverse generative domains, and determine whether this relationship persists or changes in these regimes.

Background

The paper demonstrates the KL–forgetting relationship across moderate-scale LLMs and a toy setting, and shows RL’s on-policy bias toward KL-minimal solutions. However, the generality of this relationship at much larger (frontier) scales and in broader generative domains has not been established. The authors explicitly note this gap as an unresolved question.

References

Moreover, while we demonstrate the KL–forgetting link across moderate-scale LLMs and toy models, its behavior at frontier scales and in more diverse generative domains remains unknown.

— RL's Razor: Why Online Reinforcement Learning Forgets Less (2509.04259 - Shenfeld et al., 4 Sep 2025) in Discussion and Conclusion

Behavior of the KL–forgetting link at frontier scale and across domains

Sponsor

Background

References

Related Problems