Mechanistic account for why larger forward KL shifts disrupt prior knowledge

Establish a mechanistic explanation for why larger forward Kullback–Leibler divergence between the fine-tuned policy and the base policy on the new task distribution leads to degradation of prior-task performance, and identify whether representational interference, capacity limits, or other dynamics are responsible.

Background

The authors propose an empirical forgetting law: the forward KL divergence to the base policy, measured on the new task, predicts catastrophic forgetting across methods and settings. Although they provide empirical and theoretical evidence for RL’s tendency to find KL-minimal solutions, they note that the mechanistic reason why larger KL shifts disrupt prior capabilities is not yet understood. They highlight possible explanatory factors, such as representational interference or implicit capacity limits, but do not resolve which mechanism is causal.

References

However, we still lack a mechanistic account of why larger KL shifts on the new task disrupt prior knowledge—whether through representational interference, implicit capacity limits, or other dynamics.

— RL's Razor: Why Online Reinforcement Learning Forgets Less (2509.04259 - Shenfeld et al., 4 Sep 2025) in Discussion and Conclusion

Mechanistic account for why larger forward KL shifts disrupt prior knowledge

Background

References

Related Problems