What governs catastrophic forgetting and why RL vs. SFT differ

Determine the underlying mechanism that governs catastrophic forgetting in foundation models and explain why supervised fine-tuning (SFT) and on-policy reinforcement learning (RL) exhibit different forgetting behavior despite achieving similar new-task performance.

Background

The paper studies catastrophic forgetting when adapting foundation models via supervised fine-tuning (SFT) and reinforcement learning (RL). Empirically, RL reaches comparable new-task performance while preserving prior capabilities better than SFT. Prior approaches targeted symptoms (e.g., constraining parameter changes) but did not explain the cause of forgetting or the algorithmic differences. The authors’ results motivate identifying a principled mechanism that explains when and why forgetting occurs and why RL behaves differently than SFT.

References

Consequently, it remains unclear what truly governs forgetting or why different training algorithms behave so differently.

— RL's Razor: Why Online Reinforcement Learning Forgets Less (2509.04259 - Shenfeld et al., 4 Sep 2025) in Section 1, Introduction

What governs catastrophic forgetting and why RL vs. SFT differ

Sponsor

Background

References

Related Problems