Efficacy of LR re-warm/re-decay and replay strategies under broader continual pretraining conditions
Verify the efficacy of combining learning-rate re-warm/re-decay and replay for continual pretraining under larger distribution shifts, scaled model and dataset sizes, infinite learning-rate schedules, growing model architectures, and tokenizer adaptation to larger distribution changes.
Sponsor
References
While the experiments updated the model on two subsequent tasks, the approach's efficacy in settings involving larger distribution shifts, model and dataset scales, infinite LR schedules, growing models, and tokenizer adaptation for handling larger changes in data distribution remains to be verified.
— Towards Incremental Learning in Large Language Models: A Critical Review
(2404.18311 - Jovanovic et al., 28 Apr 2024) in Section 2.1 (Continual Learning) – Simple and Scalable Strategies to Continually Pre-train LLMs