Investigating the Impact of Weight Sharing Decisions on Knowledge Transfer in Continual Learning (2311.09506v3)
Abstract: Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks, improving network efficiency and adaptability to different tasks. Additionally, CL serves as an ideal setting for studying network behavior and Forward Knowledge Transfer (FKT) between tasks. Pruning methods for CL train subnetworks to handle the sequential tasks which allows us to take a structured approach to investigating FKT. Sharing prior subnetworks' weights leverages past knowledge for the current task through FKT. Understanding which weights to share is important as sharing all weights can yield sub-optimal accuracy. This paper investigates how different sharing decisions affect the FKT between tasks. Through this lens we demonstrate how task complexity and similarity influence the optimal weight sharing decisions, giving insights into the relationships between tasks and helping inform decision making in similar CL methods. We implement three sequential datasets designed to emphasize variation in task complexity and similarity, reporting results for both ResNet-18 and VGG-16. By sharing in accordance with the decisions supported by our findings, we show that we can improve task accuracy compared to other sharing decisions.
- “Incremental classifier learning with generative adversarial networks,” arXiv preprint arXiv:1802.00853, 2018.
- “Continual learning with deep generative replay,” Advances in neural information processing systems, vol. 30, 2017.
- “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
- “Continual learning through synaptic intelligence,” in International conference on machine learning. PMLR, 2017, pp. 3987–3995.
- “Packnet: Adding multiple tasks to a single network by iterative pruning,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7765–7773.
- “Overcoming catastrophic forgetting with hard attention to the task,” in International conference on machine learning. PMLR, 2018, pp. 4548–4557.
- “Piggyback: Adapting a single network to multiple tasks by learning to mask weights,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 67–82.
- “Compacting, picking and growing for unforgetting continual learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- “Continual learning via neural pruning,” arXiv preprint arXiv:1903.04476, 2019.
- “Continual learning with node-importance based adaptive group sparse regularization,” Advances in neural information processing systems, vol. 33, pp. 3647–3658, 2020.
- “Learn-prune-share for lifelong learning,” in 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 2020, pp. 641–650.
- “Continual prune-and-select: class-incremental learning with specialized subnetworks,” Applied Intelligence, pp. 1–16, 2023.
- “Long live the lottery: The existence of winning tickets in lifelong learning,” in International Conference on Learning Representations, 2020.
- “Forget-free continual learning with winning subnetworks,” in International Conference on Machine Learning. PMLR, 2022, pp. 10734–10750.
- “Parameter-level soft-masking for continual learning,” arXiv preprint arXiv:2306.14775, 2023.
- “Spacenet: Make free space for continual learning,” Neurocomputing, vol. 439, pp. 1–11, 2021.
- “Continual learning of a mixed sequence of similar and dissimilar tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 18493–18504, 2020.
- “Theoretical understanding of the information flow on continual learning performance,” in European Conference on Computer Vision. Springer, 2022, pp. 86–101.
- Ya Le and Xuan Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, pp. 3, 2015.
- “Learning multiple layers of features from tiny images,” 2009.
- “Deep learning for classical japanese literature,” arXiv preprint arXiv:1812.01718, 2018.
- “Emnist: Extending mnist to handwritten letters,” in 2017 international joint conference on neural networks (IJCNN). IEEE, 2017, pp. 2921–2926.
- “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.