Compacting, Picking and Growing for Unforgetting Continual Learning (1910.06562v3)

Published 15 Oct 2019 in cs.LG and stat.ML

Abstract: Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manner, we introduce an incremental learning method that is scalable to the number of sequential tasks in a continual learning process. Our approach is easy to implement and owns several favorable characteristics. First, it can avoid forgetting (i.e., learn new tasks while remembering all previous tasks). Second, it allows model expansion but can maintain the model compactness when handling sequential tasks. Besides, through our compaction and selection/expansion mechanism, we show that the knowledge accumulated through learning previous tasks is helpful to build a better model for the new tasks compared to training the models independently with tasks. Experimental results show that our approach can incrementally learn a deep model tackling multiple tasks without forgetting, while the model compactness is maintained with the performance more satisfiable than individual task training.

Citations (277)

View on Semantic Scholar

Summary

The paper introduces the CPG framework that integrates model pruning, critical weight selection, and network expansion to prevent catastrophic forgetting.
CPG employs a learnable binary mask to accurately retain essential weights from previous tasks while efficiently incorporating new information.
Experimental results on CIFAR-100 and fine-grained tasks demonstrate that CPG maintains high accuracy with minimal model expansion compared to state-of-the-art methods.

Continual Learning via Compacting, Picking, and Growing

The paper "Compacting, Picking and Growing for Unforgetting Continual Learning" proposes a novel approach to tackle the challenge of catastrophic forgetting in continual learning models. It introduces an incremental learning framework known as Compacting, Picking, and Growing (CPG), which integrates model compression, critical weight selection, and progressive network expansion, maintaining model compactness while continually learning new tasks.

Methodology Overview

The CPG approach leverages three core principles:

Compacting: This step involves model pruning to remove redundant weights without sacrificing performance, thus freeing up capacity for subsequent tasks. This method retains the exact performance for previously learned tasks by keeping their weights immutable.
Picking: In this stage, a learnable binary mask is applied to select weights critical to new tasks from previously preserved weights. This facilitates the efficient reuse of accumulated knowledge, mitigating the inertia effect observed in other methods where whole previous models are reused in new-task training.
Growing: When necessary, the model architecture can be expanded by introducing new filters or nodes, allowing the network to handle an unlimited number of tasks incrementally. This ensures that capacity limitations do not hinder learning.

Experimental Results

The paper presents experimental validation across several datasets and tasks, compared against other state-of-the-art approaches such as ProgressiveNet, PackNet, and Piggyback. The results demonstrate that CPG achieves superior performance, particularly in maintaining accuracy on both earlier and new tasks, while keeping the model compact.

On the CIFAR-100 dataset divided into 20 tasks, CPG maintained high accuracy across all tasks with only a marginal increase in model size, contrasting with increasing model sizes seen in other methods.
Applying CPG on a series of fine-grained image classification tasks further highlighted its capability to improve upon the performance baseline established by strong pre-trained models like those on ImageNet.
In realistic scenarios involving facial-informatic tasks, CPG outperformed fine-tuning while achieving comparable performance to independent models without any significant model expansion.

Implications and Future Directions

The CPG framework offers a promising approach to lifelong learning, supporting scalability while preserving past knowledge through an efficient reuse of model capacities. The capability to dynamically expand the model while effectively managing unused capacities suggests potential for practical deployment in dynamic environments where task sequences cannot be predetermined.

Future work could explore automated determination of pruning and picking parameters, further tuning the approach for scenarios with unknown task boundaries. Additionally, integrating advanced techniques such as channel pruning might offer further model compactness without performance trade-offs.

CPG positions itself as a robust tool in the toolkit of continual learning strategies, emphasizing the symbiosis of memory efficiency and performance fidelity—a balance critical to advancing AI's capabilities in truly lifelong learning contexts.