Does the “later data is more memorized” phenomenon hold for post-training?
Ascertain whether the empirical observation that data appearing later in the training process is more likely to be memorized extends to post-training stages of large language models, including supervised fine-tuning and reinforcement learning alignment, where loss masking and RL objectives may alter memorization dynamics.
References
jagielski2022measuring showed that data that appears later in the training process is more likely to be memorized. Our focus on memorisation of alignment and strategic proprietary data presents an interesting open-question about if this finding still holds for post-training.
— Extracting alignment data in open models
(2510.18554 - Barbero et al., 21 Oct 2025) in Appendix, Section Extended background, Memorisation