Primal Dual Continual Learning: Balancing Stability and Plasticity through Adaptive Memory Allocation (2310.00154v2)
Abstract: Continual learning is inherently a constrained learning problem. The goal is to learn a predictor under a no-forgetting requirement. Although several prior studies formulate it as such, they do not solve the constrained problem explicitly. In this work, we show that it is both possible and beneficial to undertake the constrained optimization problem directly. To do this, we leverage recent results in constrained learning through Lagrangian duality. We focus on memory-based methods, where a small subset of samples from previous tasks can be stored in a replay buffer. In this setting, we analyze two versions of the continual learning problem: a coarse approach with constraints at the task level and a fine approach with constraints at the sample level. We show that dual variables indicate the sensitivity of the optimal value of the continual learning problem with respect to constraint perturbations. We then leverage this result to partition the buffer in the coarse approach, allocating more resources to harder tasks, and to populate the buffer in the fine approach, including only impactful samples. We derive a deviation bound on dual variables as sensitivity indicators, and empirically corroborate this result in diverse continual learning benchmarks. We also discuss the limitations of these methods with respect to the amount of memory available and the expressiveness of the parametrization.
- Online continual learning with no task boundaries. CoRR, abs/1903.08671, 2019. URL http://arxiv.org/abs/1903.08671.
- How relevant is selective memory population in lifelong language learning?, 2022.
- Curriculum learning. In Andrea Pohoreckyj Danyluk, Léon Bottou, and Michael L. Littman (eds.), Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, volume 382 of ACM International Conference Proceeding Series, pp. 41–48. ACM, 2009. doi: 10.1145/1553374.1553380. URL https://doi.org/10.1145/1553374.1553380.
- Sequential quadratic programming. Acta numerica, 4:1–51, 1995.
- Optimization problems with perturbations: A guided tour. SIAM Review, 40(2):228–264, 1998. ISSN 00361445. URL http://www.jstor.org/stable/2653333.
- Coresets via bilevel optimization for continual learning and streaming. CoRR, abs/2006.03875, 2020. URL https://arxiv.org/abs/2006.03875.
- Class-incremental continual learning into the extended der-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5497–5512, 2022.
- Dark experience for general continual learning: a strong, simple baseline, 2020a.
- Dark experience for general continual learning: a strong, simple baseline, 2020b.
- Probably approximately correct constrained learning. Advances in Neural Information Processing Systems, 33:16722–16735, 2020.
- Efficient lifelong learning with A-GEM. CoRR, abs/1812.00420, 2018. URL http://arxiv.org/abs/1812.00420.
- A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
- A lagrangian duality approach to active learning. Advances in Neural Information Processing Systems, 35:37575–37589, 2022.
- Memory efficient continual learning with transformers. Advances in Neural Information Processing Systems, 35:10629–10642, 2022.
- Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.
- Fast rates in pool-based batch active learning, 2022.
- Sequential quadratic programming methods. In Mixed integer nonlinear programming, pp. 147–224. Springer, 2011.
- Vincent Guigues. Inexact stochastic mirror descent for two-stage nonlinear stochastic programs, 2020.
- Online continual learning through mutual information maximization. In International Conference on Machine Learning, pp. 8109–8126. PMLR, 2022.
- Embracing change: Continual learning in deep neural networks. Trends in cognitive sciences, 24(12):1028–1040, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- L. Hurwicz K. J. Arrow and H. Uzawa. Studies in linear and non-linear programming, by k. j. arrow, l. hurwicz and h. uzawa. stanford university press, 1958. 229 pages. Canadian Mathematical Bulletin, 3(3):196–198, 1960. doi: 10.1017/S0008439500025522.
- Mind your outliers! investigating the negative impact of outliers on active learning for visual question answering. In Chengqing Zong, Fei Xia, Wenjie Li 0002, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pp. 7265–7281. Association for Computational Linguistics, 2021. ISBN 978-1-954085-52-7. URL https://aclanthology.org/2021.acl-long.564.
- Not all samples are created equal: Deep learning with importance sampling. CoRR, abs/1803.00942, 2018. URL http://arxiv.org/abs/1803.00942.
- Continual learning of language models. arXiv preprint arXiv:2302.03241, 2023.
- Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Igor Kononenko. Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, 23(1):89–109, 2001.
- Dieter Kraft. A software package for sequential quadratic programming. Forschungsbericht- Deutsche Forschungs- und Versuchsanstalt fur Luft- und Raumfahrt, 1988.
- Convex functional analysis. Springer Science & Business Media, 2006.
- Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015.
- MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks, 2021.
- Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7765–7773, 2018.
- Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5513–5533, 2022.
- Catastrophic interference in connectionist networks: The sequential learning problem. volume 24 of Psychology of Learning and Motivation, pp. 109–165. Academic Press, 1989. doi: https://doi.org/10.1016/S0079-7421(08)60536-8. URL https://www.sciencedirect.com/science/article/pii/S0079742108605368.
- Knowledge distillation for incremental learning in semantic segmentation. Computer Vision and Image Understanding, 205:103167, 2021. ISSN 1077-3142. doi: https://doi.org/10.1016/j.cviu.2021.103167. URL https://www.sciencedirect.com/science/article/pii/S1077314221000114.
- Approximate primal solutions and rate analysis for dual subgradient methods. SIAM Journal on Optimization, 19(4):1757–1780, 2009. doi: 10.1137/070708111. URL https://doi.org/10.1137/070708111.
- The ideal continual learner: An agent that never forgets. arXiv preprint arXiv:2305.00316, 2023.
- icarl: Incremental classifier and representation learning. CoRR, abs/1611.07725, 2016. URL http://arxiv.org/abs/1611.07725.
- R Tyrrell Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.
- Experience replay for continual learning. CoRR, abs/1811.11682, 2018. URL http://arxiv.org/abs/1811.11682.
- Stochastic convex optimization. In COLT, volume 2, pp. 5, 2009.
- An introduction to lifelong supervised learning. arXiv preprint arXiv:2207.04354, 2022.
- Information-theoretic online memory selection for continual learning, 2022.
- Lifelong robot learning. Robotics and Autonomous Systems, 15(1):25 – 46, July 1995.
- An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJlxm30cKm.
- Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37–57, 1985.
- Deep active learning by leveraging training dynamics. CoRR, abs/2110.08611, 2021a. URL https://arxiv.org/abs/2110.08611.
- Training networks in null space of feature covariance for continual learning, 2021b.
- Pete Warden. Speech commands: A dataset for limited-vocabulary speech recognition. CoRR, abs/1804.03209, 2018. URL http://arxiv.org/abs/1804.03209.
- Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification. CoRR, abs/2110.14795, 2021. URL https://arxiv.org/abs/2110.14795.
- Continual learning through synaptic intelligence. In International conference on machine learning, pp. 3987–3995. PMLR, 2017.
- Understanding deep learning requires rethinking generalization. CoRR, abs/1611.03530, 2016. URL http://arxiv.org/abs/1611.03530.
- Deep class-incremental learning: A survey, 2023.
- Juan Elenter (6 papers)
- Navid NaderiAlizadeh (36 papers)
- Tara Javidi (70 papers)
- Alejandro Ribeiro (281 papers)