Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning (2402.09542v3)

Published 14 Feb 2024 in cs.LG

Abstract: In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. On warm-starting neural network training. Advances in neural information processing systems, 33:3884–3894, 2020.
  2. Stochastic (approximate) proximal point methods: Convergence, optimality, and adaptivity. SIAM Journal on Optimization, 29(3):2257–2290, January 2019. ISSN 1095-7189. doi: 10.1137/18m1230323. URL http://dx.doi.org/10.1137/18M1230323.
  3. Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
  4. New insights on reducing abrupt representation change in online continual learning. arXiv preprint arXiv:2104.05025, 2021.
  5. Online fast adaptation and knowledge accumulation: a new approach to continual learning. arXiv preprint arXiv:2003.05856, 2020.
  6. Avalanche: A pytorch library for deep continual learning. Journal of Machine Learning Research, 24(363):1–6, 2023a.
  7. Improving online continual learning performance and stability with temporal ensembles. arXiv preprint arXiv:2306.16817, 2023b.
  8. Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420, 2018.
  9. Continual learning with tiny episodic memories. In Workshop on Multi-Task and Lifelong Reinforcement Learning, 2019.
  10. Continual evaluation for lifelong learning: Identifying the stability gap. arXiv preprint arXiv:2205.13452, 2022.
  11. Flattening sharpness for dynamic gradient projection memory benefits continual learning. Advances in Neural Information Processing Systems, 34:18710–18721, 2021.
  12. Drusvyatskiy, D. The proximal point method revisited, 2017.
  13. Organizing recurrent network dynamics by task-computation to enable continual learning. Advances in neural information processing systems, 33:14387–14397, 2020.
  14. Orthogonal gradient descent for continual learning. In International Conference on Artificial Intelligence and Statistics, pp.  3762–3773. PMLR, 2020.
  15. French, R. M. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
  16. Adaptive orthogonal projection for batch and online continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  6783–6791, 2022.
  17. Two complementary perspectives to continual learning: Ask not only what to optimize, but also how. arXiv preprint arXiv:2311.04898, 2023.
  18. Natural continual learning: success is a journey, not (just) a destination. Advances in neural information processing systems, 34:28067–28079, 2021.
  19. Parameter-level soft-masking for continual learning. arXiv preprint arXiv:2306.14775, 2023.
  20. Learning multiple layers of features from tiny images, 2009.
  21. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  22. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  23. Loss decoupling for task-agnostic continual learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  24. Trgp: Trust region gradient projection for continual learning. arXiv preprint arXiv:2202.02931, 2022.
  25. The clear benchmark: Continual learning on real-world imagery. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2), 2021.
  26. Core50: a new dataset and benchmark for continuous object recognition. In Conference on robot learning, pp.  17–26. PMLR, 2017.
  27. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pp.  2408–2417. PMLR, 2015.
  28. Proximal algorithms. Found. Trends Optim., 1(3):127–239, jan 2014. ISSN 2167-3888. URL https://doi.org/10.1561/2400000003.
  29. Continual learning with scaled gradient projection. arXiv preprint arXiv:2302.01386, 2023.
  30. Space: Structured compression and sharing of representational space for continual learning. IEEE Access, 9:150480–150494, 2021a.
  31. Gradient projection memory for continual learning. arXiv preprint arXiv:2103.09762, 2021b.
  32. Replay-oriented gradient projection memory for continual learning in medical scenarios. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.  1724–1729, 2022. doi: 10.1109/BIBM55620.2022.9995580.
  33. A comprehensive empirical evaluation on online continual learning, 2023.
  34. Three types of incremental learning, December 2022. URL http://dx.doi.org/10.1038/s42256-022-00568-3.
  35. Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence, 1(8):364–372, 2019.
  36. A simple but strong baseline for online continual learning: Repeated augmented rehearsal. Advances in Neural Information Processing Systems, 35:14771–14783, 2022.
  37. Rethinking gradient projection continual learning: Stability/plasticity feature space decoupling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3718–3727, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jason Yoo (4 papers)
  2. Yunpeng Liu (55 papers)
  3. Frank Wood (98 papers)
  4. Geoff Pleiss (41 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets