A Unified and General Framework for Continual Learning (2403.13249v1)
Abstract: Continual Learning (CL) focuses on learning from dynamic and changing data distributions while retaining previously acquired knowledge. Various methods have been developed to address the challenge of catastrophic forgetting, including regularization-based, Bayesian-based, and memory-replay-based techniques. However, these methods lack a unified framework and common terminology for describing their approaches. This research aims to bridge this gap by introducing a comprehensive and overarching framework that encompasses and reconciles these existing methodologies. Notably, this new framework is capable of encompassing established CL approaches as special instances within a unified and general optimization objective. An intriguing finding is that despite their diverse origins, these methods share common mathematical structures. This observation highlights the compatibility of these seemingly distinct techniques, revealing their interconnectedness through a shared underlying optimization objective. Moreover, the proposed general framework introduces an innovative concept called refresh learning, specifically designed to enhance the CL performance. This novel approach draws inspiration from neuroscience, where the human brain often sheds outdated information to improve the retention of crucial knowledge and facilitate the acquisition of new information. In essence, refresh learning operates by initially unlearning current data and subsequently relearning it. It serves as a versatile plug-in that seamlessly integrates with existing CL methods, offering an adaptable and effective enhancement to the learning process. Extensive experiments on CL benchmarks and theoretical analysis demonstrate the effectiveness of the proposed refresh learning. Code is available at \url{https://github.com/joey-wang123/CL-refresh-learning}.
- Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV), pp. 139–154, 2018.
- Gradient based sample selection for online continual learning. In Advances in Neural Information Processing Systems 30, 2019.
- Learning fast, learning slow: A general continual learning method based on complementary learning system. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=uxxFrDwrE7Y.
- Clustering with bregman divergences. Journal of machine learning research, 6(10), 2005.
- Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 141–159. IEEE, 2021.
- Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
- New insights on reducing abrupt representation change in online continual learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=N8MaByOzUfb.
- Cpr: Classifier-projection regularization for continual learning. In International Conference on Learning Representations, 2021.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pp. 532–547, 2018.
- Efficient lifelong learning with a-gem. Proceedings of the International Conference on Learning Representations, 2019a.
- Continual learning with tiny episodic memories. https://arxiv.org/abs/1902.10486, 2019b.
- On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019c.
- Using hindsight to anchor past knowledge in continual learning. Association for the Advancement of Artificial Intelligence (AAAI), 2021.
- Ronald L Davis and Yi Zhong. The biology of forgetting—a perspective. Neuron, 95(3):490–503, 2017.
- What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems, 33:2881–2891, 2020.
- Escaping from saddle points—online stochastic gradient for tensor decomposition. In Conference on learning theory, pp. 797–842. PMLR, 2015.
- Making ai forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019.
- Lauren Gravitz. The importance of forgetting. Nature, 571(July):S12–S14, 2019.
- Certified data removal from machine learning models. In International Conference on Machine Learning, pp. 3832–3842. PMLR, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Posterior meta-replay for continual learning. Advances in Neural Information Processing Systems, 34:14135–14149, 2021.
- Compacting, picking and growing for unforgetting continual learning. Advances in Neural Information Processing Systems, 32, 2019.
- Averaging weights leads to wider optima and better generalization. Annual Conference on Uncertainty in Artificial Intelligence, 2018.
- Leo P Kadanoff. Statistical physics: statics, dynamics and renormalization. World Scientific, 2000.
- Natural continual learning: success is a journey, not (just) a destination. Advances in neural information processing systems, 34:28067–28079, 2021.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Continual learning with bayesian neural networks for non-stationary data. In International Conference on Learning Representations, 2019.
- Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. In International Conference on Machine Learning, pp. 3925–3934. PMLR, 2019.
- Learning without forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
- Neural parametric fokker–planck equation. SIAM Journal on Numerical Analysis, 60(3):1385–1449, 2022.
- Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
- A complete recipe for stochastic gradient mcmc. Advances in neural information processing systems, 28, 2015.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7765–7773, 2018.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier, 1989.
- Variational continual learning. In International Conference on Learning Representations, 2018.
- Practical deep learning with bayesian principles. Advances in neural information processing systems, 32, 2019.
- Continual deep learning by functional regularisation of memorable past. Advances in Neural Information Processing Systems, 33:4453–4464, 2020.
- Stochastic gradient riemannian langevin dynamics on the probability simplex. Advances in neural information processing systems, 26, 2013.
- Dualnet: Continual learning, fast and slow. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=eQ7Kh-QeWnO.
- Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pp. 1674–1703. PMLR, 2017.
- The persistence and transience of memory. Neuron, 94(6):1071–1084, 2017.
- Learning to learn without forgetting by maximizing transfer and minimizing interference. In International Conference on Learning Representations, 2019.
- Continual learning via sequential function-space variational inference. In International Conference on Machine Learning, pp. 18871–18887. PMLR, 2022.
- Gradient projection memory for continual learning. ICLR, 2021.
- Progress and compress: A scalable framework for continual learning. In Proceedings of the International Conference on Machine Learning, 2018.
- Overcoming catastrophic forgetting with hard attention to the task. In International conference on machine learning, pp. 4548–4557. PMLR, 2018.
- John Sweller. Cognitive load theory. In Psychology of learning and motivation, volume 55, pp. 37–76. Elsevier, 2011.
- Functional regularisation for continual learning with gaussian processes. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=HkxCzeHFDB.
- Machine unlearning via algorithmic stability. In Conference on Learning Theory, pp. 4126–4142. PMLR, 2021.
- Three types of incremental learning. Nature Machine Intelligence, 4(12):1185–1197, 2022.
- Afec: Active forgetting of negative transfer in continual learning. Advances in Neural Information Processing Systems, 34:22379–22391, 2021.
- Improving task-free continual learning by distributionally robust memory evolution. In International Conference on Machine Learning, pp. 22985–22998. PMLR, 2022a.
- Meta-learning with less forgetting on large-scale non-stationary task distributions. In European Conference on Computer Vision, pp. 221–238. Springer, 2022b.
- Distributionally robust memory evolution with generalized divergence for continual learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023a.
- A comprehensive survey of forgetting in deep learning beyond continual learning. arXiv preprint arXiv:2307.09218, 2023b.
- Dualhsic: Hsic-bottleneck and alignment for continual learning. In International Conference on Machine Learning, 2023c.
- Andre Wibisono. Sampling as optimization in the space of measures: The langevin dynamics as a composite optimization problem. In Conference on Learning Theory, pp. 2093–3027. PMLR, 2018.
- Deltagrad: Rapid retraining of machine learning models. In International Conference on Machine Learning, pp. 10355–10366. PMLR, 2020.
- Data augmented flatness-aware gradient projection for continual learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5630–5639, 2023a.
- An efficient dataset condensation plugin and its application to continual learning. Advances in Neural Information Processing Systems, 36, 2023b.
- Continual learning through synaptic intelligence. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp. 3987–3995, 2017a.
- Continual learning through synaptic intelligence. In International conference on machine learning, pp. 3987–3995. PMLR, 2017b.
- Penalizing gradient norm for efficiently improving generalization in deep learning. In International Conference on Machine Learning, pp. 26982–26992. PMLR, 2022.
- Zhenyi Wang (27 papers)
- Yan Li (505 papers)
- Li Shen (363 papers)
- Heng Huang (189 papers)