Towards Redundancy-Free Sub-networks in Continual Learning (2312.00840v2)
Abstract: Catastrophic Forgetting (CF) is a prominent issue in continual learning. Parameter isolation addresses this challenge by masking a sub-network for each task to mitigate interference with old tasks. However, these sub-networks are constructed relying on weight magnitude, which does not necessarily correspond to the importance of weights, resulting in maintaining unimportant weights and constructing redundant sub-networks. To overcome this limitation, inspired by information bottleneck, which removes redundancy between adjacent network layers, we propose \textbf{\underline{I}nformation \underline{B}ottleneck \underline{M}asked sub-network (IBM)} to eliminate redundancy within sub-networks. Specifically, IBM accumulates valuable information into essential weights to construct redundancy-free sub-networks, not only effectively mitigating CF by freezing the sub-networks but also facilitating new tasks training through the transfer of valuable knowledge. Additionally, IBM decomposes hidden representations to automate the construction process and make it flexible. Extensive experiments demonstrate that IBM consistently outperforms state-of-the-art methods. Notably, IBM surpasses the state-of-the-art parameter isolation method with a 70\% reduction in the number of parameters within sub-networks and an 80\% decrease in training time.
- Information dropout: Learning optimal representations through noisy computation. IEEE TPAMI, 2018.
- Memory aware synapses: Learning what (not) to forget. In ECCV, 2018.
- EEC: learning to encode and regenerate images for continual learning. In ICLR, 2021.
- Dark experience for general continual learning: a strong, simple baseline. In NeurIPS, 2020.
- New insights on reducing abrupt representation change in online continual learning. In ICLR, 2022.
- Efficient lifelong learning with A-GEM. In ICLR, 2019.
- Class gradient projection for continual learning. In ACM MM, 2022.
- Compressing neural networks using the variational information bottleneck. In ICML, 2018.
- Mathematics for machine learning. Cambridge University Press, 2020.
- Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. In ICLR, 2019.
- Preserving linear separability in continual learning by backward feature projection. In CVPR, 2023.
- Deep residual learning for image recognition. In CVPR, 2016.
- Self-supervised visual feature learning with deep neural networks: A survey. IEEE TPAMI, 2021.
- Forget-free continual learning with winning subnetworks. In ICML, 2022.
- Variational dropout and the local reparameterization trick. In NeurIPS, 2015.
- Parameter-level soft-masking for continual learning. In ICML, 2023.
- Learning multiple layers of features from tiny images. 2009.
- Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
- Learning without forgetting. In ECCV, 2016.
- Runtime neural pruning. In NeurIPS, 2017.
- Gradient episodic memory for continual learning. In NeurIPS, 2017.
- Learning sparse neural networks through l00{}_{\mbox{0}}start_FLOATSUBSCRIPT 0 end_FLOATSUBSCRIPT regularization. CoRR, 2017.
- Channel importance matters in few-shot image classification. In ICML, 2022.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In CVPR, 2018.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. 1989.
- Structured bayesian pruning via log-normal multiplicative noise. In NeurIPS, 2017.
- Gdumb: A simple approach that questions our progress in continual learning. In ECCV, 2020.
- Subset selection in noise based on diversity measure minimization. IEEE TSP, 2003.
- Learning to learn without forgetting by maximizing transfer and minimizing interference. In ICLR, 2019.
- Mark B. Ring. Child: A first step towards continual learning. In Learning to Learn. Springer, 1998.
- Progressive neural networks. CoRR, 2016.
- Progress & compress: A scalable framework for continual learning. In ICML, 2018.
- Overcoming catastrophic forgetting with hard attention to the task. In ICML, 2018.
- Deep learning and the information bottleneck principle. CoRR, 2015.
- Efficient continual learning with modular networks and task-driven priors. In ICLR, 2021.
- Matching networks for one shot learning. In NeurIPS, 2016.
- Visual tracking with fully convolutional networks. In ICCV, 2015.
- Supermasks in superposition. In NeurIPS, 2020.
- Continual learning through synaptic intelligence. In ICML, 2017.