Does the “later data is more memorized” phenomenon hold for post-training?

Ascertain whether the empirical observation that data appearing later in the training process is more likely to be memorized extends to post-training stages of large language models, including supervised fine-tuning and reinforcement learning alignment, where loss masking and RL objectives may alter memorization dynamics.

Background

Prior work (Jagielski et al., 2022) finds that examples encountered later during training are more likely to be memorized. The authors highlight that post-training differs from pretraining: SFT often masks question tokens during loss computation, and RL-based methods have been observed to be less prone to memorization than instruction tuning under equal compute.

Verifying whether the “later data memorizes more” pattern persists in post-training phases that include SFT and RLHF/PPO would clarify how alignment datasets contribute to memorization and potential data leakage.

References

jagielski2022measuring showed that data that appears later in the training process is more likely to be memorized. Our focus on memorisation of alignment and strategic proprietary data presents an interesting open-question about if this finding still holds for post-training.

— Extracting alignment data in open models (2510.18554 - Barbero et al., 21 Oct 2025) in Appendix, Section Extended background, Memorisation

Does the “later data is more memorized” phenomenon hold for post-training?

Background

References

Related Problems