Class Gradient Projection For Continual Learning (2311.14905v1)
Abstract: Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL). Recent approaches tackle this problem by projecting the gradient update orthogonal to the gradient subspace of existing tasks. While the results are remarkable, those approaches ignore the fact that these calculated gradients are not guaranteed to be orthogonal to the gradient subspace of each class due to the class deviation in tasks, e.g., distinguishing "Man" from "Sea" v.s. differentiating "Boy" from "Girl". Therefore, this strategy may still cause catastrophic forgetting for some classes. In this paper, we propose Class Gradient Projection (CGP), which calculates the gradient subspace from individual classes rather than tasks. Gradient update orthogonal to the gradient subspace of existing classes can be effectively utilized to minimize interference from other classes. To improve the generalization and efficiency, we further design a Base Refining (BR) algorithm to combine similar classes and refine class bases dynamically. Moreover, we leverage a contrastive learning method to improve the model's ability to handle unseen tasks. Extensive experiments on benchmark datasets demonstrate the effectiveness of our proposed approach. It improves the previous methods by 2.0% on the CIFAR-100 dataset.
- Conditional channel gated networks for task-aware continual learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3931–3940.
- Memory aware synapses: Learning what (not) to forget. In Proceedings of the European Conference on Computer Vision. 139–154.
- Ali Ayub and Alan R. Wagner. 2021. EEC: Learning to Encode and Regenerate Images for Continual Learning. In Proceedings of the International Conference on Learning Representations.
- Yaroslav Bulatov. 2011. Notmnist dataset. Google (Books/OCR), Tech. Rep.[Online]. Available: http://yaroslavvb. blogspot. it/2011/09/notmnist-dataset. html 2 (2011).
- Guided Attention Network for Object Detection and Counting on Drones. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. 709–717.
- Co2L: Contrastive Continual Learning. In Proceedings of the International Conference on Computer Vision. 9516–9525.
- Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420 (2018).
- Continual Learning with Tiny Episodic Memories. CoRR abs/1902.10486 (2019).
- A simple framework for contrastive learning of visual representations. In Proceedings of the International conference on machine learning. 1597–1607.
- Xinlei Chen and Kaiming He. 2021. Exploring simple siamese representation learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 15750–15758.
- A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
- Ruoxi Deng and Shengjun Liu. 2020. Deep Structural Contour Detection. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. 304–312.
- Adversarial continual learning. In Proceedings of the European Conference on Computer Vision. 386–402.
- Orthogonal gradient descent for continual learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 3762–3773.
- Michael Gutmann and Aapo Hyvärinen. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 297–304.
- AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty. In Proceedings of the International Conference on Learning Representations.
- Compacting, picking and growing for unforgetting continual learning. Advances in Neural Information Processing Systems 32 (2019).
- Supervised contrastive learning. Advances in Neural Information Processing Systems 33 (2020), 18661–18673.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114 (2017), 3521–3526.
- Learning multiple layers of features from tiny images. (2009).
- Self-supervised knowledge distillation using singular value decomposition. In Proceedings of the European Conference on Computer Vision. 335–350.
- Overcoming catastrophic forgetting by incremental moment matching. Advances in neural information processing systems 30 (2017).
- Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020).
- Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical Understanding of Outdoor Scene. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. 238–246.
- Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. In Proceedings of the International Conference on Machine Learning. 3925–3934.
- TRGP: Trust Region Gradient Projection for Continual Learning. CoRR abs/2202.02931 (2022).
- David Lopez-Paz and Marc’Aurelio Ranzato. 2017. Gradient episodic memory for continual learning. Advances in neural information processing systems 30 (2017).
- Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. 109–165.
- Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011.
- Variational continual learning. arXiv preprint arXiv:1710.10628 (2017).
- Random path selection for continual learning. Advances in Neural Information Processing Systems 32 (2019).
- Roger Ratcliff. 1990. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. Psychological review 97 (1990), 285.
- icarl: Incremental classifier and representation learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2001–2010.
- Progressive Neural Networks. CoRR abs/1606.04671 (2016).
- Gradient Projection Memory for Continual Learning. In Proceedings of the International Conference on Learning Representations.
- Overcoming catastrophic forgetting with hard attention to the task. In Proceedings of the International Conference on Machine Learning. 4548–4557.
- FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. CoRR abs/2001.07685 (2020).
- Memory-based Parameter Adaptation. In Proceedings of the International Conference on Learning Representations.
- Representation Learning with Contrastive Predictive Coding. CoRR abs/1807.03748 (2018).
- Efficient Continual Learning with Modular Networks and Task-Driven Priors. In Proceedings of the International Conference on Learning Representations.
- Fine-Grained Similarity Measurement between Educational Videos and Exercises. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020. 331–339.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017).
- Scalable and Order-robust Continual Learning with Additive Parameter Decomposition. In Proceedings of the International Conference on Learning Representations.
- Continual learning of context-dependent processing in neural networks. Nature Machine Intelligence 1 (2019), 364–372.
- Continual learning through synaptic intelligence. In Proceedings of the International Conference on Machine Learning. 3987–3995.
- Progressive Meta-learning with Curriculum. IEEE Transactions on Circuits and Systems for Video Technology (2022).
- Curriculum-Based Meta-learning. In MM ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. 1838–1846.