An Effective Dynamic Gradient Calibration Method for Continual Learning (2407.20956v1)
Abstract: Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks. Due to the memory limit, we cannot store all the historical data, and therefore confront the ``catastrophic forgetting'' problem, i.e., the performance on the previous tasks can substantially decrease because of the missing information in the latter period. Though a number of elegant methods have been proposed, the catastrophic forgetting phenomenon still cannot be well avoided in practice. In this paper, we study the problem from the gradient perspective, where our aim is to develop an effective algorithm to calibrate the gradient in each updating step of the model; namely, our goal is to guide the model to be updated in the right direction under the situation that a large amount of historical data are unavailable. Our idea is partly inspired by the seminal stochastic variance reduction methods (e.g., SVRG and SAGA) for reducing the variance of gradient estimation in stochastic gradient descent algorithms. Another benefit is that our approach can be used as a general tool, which is able to be incorporated with several existing popular CL methods to achieve better performance. We also conduct a set of experiments on several benchmark datasets to evaluate the performance in practice.
- Task-free continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11254–11263, 2019a.
- Gradient based sample selection for online continual learning. Advances in neural information processing systems, 32, 2019b.
- Learning fast, learning slow: A general continual learning method based on complementary learning system. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Stopwasting my gradients: Practical svrg. Advances in Neural Information Processing Systems, 28, 2015.
- Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments. Pattern Recognition, 120:108102, 2021.
- Class-incremental continual learning into the extended der-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5497–5512, 2022.
- Optimization methods for large-scale machine learning. SIAM review, 60(2):223–311, 2018.
- Bottou, L. et al. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91(8):12, 1991.
- Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems, 33:15920–15930, 2020.
- Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV), pp. 532–547, 2018.
- Efficient lifelong learning with A-GEM. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019a.
- On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019b.
- Using hindsight to anchor past knowledge in continual learning. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pp. 6993–7001, 2021.
- A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, 44(7):3366–3385, 2021.
- Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems, 27, 2014.
- Podnet: Pooled outputs distillation for small-tasks incremental learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pp. 86–102. Springer, 2020.
- Orthogonal gradient descent for continual learning. In International Conference on Artificial Intelligence and Statistics, pp. 3762–3773. PMLR, 2020.
- Towards robust evaluations of continual learning. arXiv preprint arXiv:1805.09733, 2018.
- Competing with the empirical risk minimizer in a single pass. In Conference on learning theory, pp. 728–763. PMLR, 2015.
- DDGR: continual learning with deep diffusion-based generative replay. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 10744–10763. PMLR, 2023.
- Real-time evaluation in online continual learning: A new hope. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp. 11888–11897. IEEE, 2023.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014a.
- An empirical investigation of catastrophic forgeting in gradient-based neural networks. In Bengio, Y. and LeCun, Y. (eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014b.
- Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11):1968–1983, 2020.
- Adaptive orthogonal projection for batch and online continual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 6783–6791, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 831–839, 2019.
- Re-evaluating continual learning scenarios: A categorization and case for strong baselines. arXiv preprint arXiv:1810.12488, 2018.
- Distilling causal effect of data in class-incremental learning. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pp. 3957–3966, 2021.
- Towards better generalization: Bp-svrg in training deep neural networks. arXiv preprint arXiv:1908.06395, 2019.
- Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26, 2013.
- Achieving a better stability-plasticity trade-off via auxiliary networks in continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp. 11930–11939. IEEE, 2023. doi: 10.1109/CVPR52729.2023.01148.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Learning multiple layers of features from tiny images. 2009.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Non-convex finite-sum optimization via scsg methods. Advances in Neural Information Processing Systems, 30, 2017.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, 40(12):2935–2947, 2017.
- Continual learning with recursive gradient optimization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
- Gradient episodic memory for continual learning. Advances in neural information processing systems, 30, 2017.
- Online continual learning in image classification: An empirical survey. Neurocomputing, 469:28–51, 2022.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier, 1989.
- Linear mode connectivity in multitask and continual learning. In International Conference on Learning Representations, 2020.
- Continual lifelong learning with neural networks: A review. Neural networks, 113:54–71, 2019.
- Computationally budgeted continual learning: What does matter? In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp. 3698–3707. IEEE, 2023. doi: 10.1109/CVPR52729.2023.00360.
- Ratcliff, R. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological review, 97(2):285, 1990.
- icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010, 2017.
- Learning to learn without forgetting by maximizing transfer and minimizing interference. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Continual learning with deep generative replay. Advances in neural information processing systems, 30, 2017.
- Gcr: Gradient coreset based replay buffer selection for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 99–108, 2022.
- Three scenarios for continual learning. arXiv preprint arXiv:1904.07734, 2019.
- Vitter, J. S. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS), 11(1):37–57, 1985.
- A comprehensive survey of continual learning: Theory, method and application. arXiv preprint arXiv:2302.00487, 2023.
- Memory replay gans: Learning to generate new categories without forgetting. Advances in Neural Information Processing Systems, 31, 2018.
- Reinforced continual learning. Advances in Neural Information Processing Systems, 31, 2018.
- Der: Dynamically expandable representation for class incremental learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3014–3023, 2021.
- Continual learning by modeling intra-class variation. Trans. Mach. Learn. Res., 2023, 2023.
- Scene text detection and recognition: recent advances and future trends. Frontiers Comput. Sci., 10(1):19–36, 2016.