Mechanistic account for why larger forward KL shifts disrupt prior knowledge
Establish a mechanistic explanation for why larger forward Kullback–Leibler divergence between the fine-tuned policy and the base policy on the new task distribution leads to degradation of prior-task performance, and identify whether representational interference, capacity limits, or other dynamics are responsible.
References
However, we still lack a mechanistic account of why larger KL shifts on the new task disrupt prior knowledge—whether through representational interference, implicit capacity limits, or other dynamics.
— RL's Razor: Why Online Reinforcement Learning Forgets Less
(2509.04259 - Shenfeld et al., 4 Sep 2025) in Discussion and Conclusion