Parameter-Level Soft-Masking for Continual Learning (2306.14775v1)
Abstract: Existing research on task incremental learning in continual learning has primarily focused on preventing catastrophic forgetting (CF). Although several techniques have achieved learning with no CF, they attain it by letting each task monopolize a sub-network in a shared network, which seriously limits knowledge transfer (KT) and causes over-consumption of the network capacity, i.e., as more tasks are learned, the performance deteriorates. The goal of this paper is threefold: (1) overcoming CF, (2) encouraging KT, and (3) tackling the capacity problem. A novel technique (called SPG) is proposed that soft-masks (partially blocks) parameter updating in training based on the importance of each parameter to old tasks. Each task still uses the full network, i.e., no monopoly of any part of the network by any task, which enables maximum KT and reduction in capacity usage. To our knowledge, this is the first work that soft-masks a model at the parameter-level for continual learning. Extensive experiments demonstrate the effectiveness of SPG in achieving all three objectives. More notably, it attains significant transfer of knowledge not only among similar tasks (with shared knowledge) but also among dissimilar tasks (with little shared knowledge) while mitigating CF.
- Uncertainty-based Continual Learning with Adaptive Regularization. In Proc. of NeurIPS, 2019.
- Online Continual Learning with Maximal Interfered Retrieval. In Proc. of NeurIPS, 2019.
- Algorithms for Hyper-Parameter Optimization. In Proc. of NeurIPS, 2011.
- Efficient Lifelong Learning with A-GEM. In Proc. of ICLR, 2019.
- BinPlay: A Binary Latent Autoencoder for Generative Replay Continual Learning. In Proc. of IJCNN, 2021.
- PathNet: Evolution Channels Gradient Descent in Super Neural Networks, 2017.
- Forget-free Continual Learning with Winning Subnetworks. In Proc. of ICML, 2022.
- Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks. In Proc. of NeurIPS, 2020.
- Overcoming catastrophic forgetting in neural networks. In Proc. of NAS, 2017.
- Learning Multiple Layers of Features from Tiny Images, 2009.
- ImageNet Classification with Deep Convolutional Neural Networks. In Proc. of NeurIPS, 2012.
- Learning without Forgetting. In Proc. of ECCV, 2016.
- Deep Learning Face Attributes in the Wild. In Proc. of ICCV, 2015.
- Gradient Episodic Memory for Continual Learning. In Proc. of NeurIPS, 2017.
- PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning. In Proc. of CVPR, 2018.
- TAG: Task-based Accumulated Gradients for Lifelong Learning. In Proc. of CoLLAs, 2022.
- iCaRL: Incremental Classifier and Representation Learning. In Proc. of CVPR, 2017.
- Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference. In Proc. of ICLR, 2019.
- ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
- Progressive Neural Networks, 2016.
- Overcoming Catastrophic Forgetting with Hard Attention to the Task. In Proc. of ICML, 2018.
- Continual Learning with Deep Generative Replay. In Proc. of NeurIPS, 2017.
- SpaceNet: Make Free Space For Continual Learning. Neurocomputing, 439:1–11, 2021.
- Supermasks in Superposition. In Proc. of NeurIPS, 2020.
- Tiny ImageNet Challenge, 2017.
- Scalable and Order-robust Continual Learning with Additive Parameter Decomposition. In Proc. of ICLR, 2020.
- Continual Learning Through Synaptic Intelligence. In Proc. of ICML, 2017.
- Class-incremental Learning via Deep Model Consolidation. In Proc. of WACV, 2020.