Lifelong Learning with Dynamically Expandable Networks (1708.01547v11)

Published 4 Aug 2017 in cs.LG

Abstract: We propose a novel deep network architecture for lifelong learning which we refer to as Dynamically Expandable Network (DEN), that can dynamically decide its network capacity as it trains on a sequence of tasks, to learn a compact overlapping knowledge sharing structure among tasks. DEN is efficiently trained in an online manner by performing selective retraining, dynamically expands network capacity upon arrival of each task with only the necessary number of units, and effectively prevents semantic drift by splitting/duplicating units and timestamping them. We validate DEN on multiple public datasets under lifelong learning scenarios, on which it not only significantly outperforms existing lifelong learning methods for deep networks, but also achieves the same level of performance as the batch counterparts with substantially fewer number of parameters. Further, the obtained network fine-tuned on all tasks obtained significantly better performance over the batch models, which shows that it can be used to estimate the optimal network structure even when all tasks are available in the first place.

Citations (1,125)

View on Semantic Scholar

Summary

The paper presents a novel DEN that dynamically adjusts its capacity for lifelong learning on sequential tasks.
It leverages selective retraining, dynamic expansion, and neuron duplication to efficiently share and preserve knowledge while mitigating negative transfer.
Experimental results on benchmarks like MNIST-Variation and CIFAR-100 show DEN achieves near batch-trained accuracy using only 18% to 60% of the parameters of conventional models.

Lifelong Learning with Dynamically Expandable Networks

In the domain of lifelong learning within deep neural networks, the paper proposes a novel architecture called Dynamically Expandable Network (DEN). The paramount innovation of DEN lies in its ability to decide network capacity dynamically during training on sequential tasks, effectively learning a compact and overlapping knowledge-sharing structure among tasks. This method is distinct in that it combines selective retraining, dynamic network expansion, and strategies to prevent semantic drift, thus addressing many challenges associated with lifelong learning.

Overview of DEN

DEN operates by initially training on a sparse network structure, where only relevant parts are selectively retrained for subsequent tasks. This partial retraining is computationally efficient and mitigates the negative transfer often seen with complete retraining systems. Furthermore, when the existing network capacity is insufficient for new tasks, DEN expands dynamically, adding only the necessary units, thus maintaining a balance between model complexity and performance.

Key components of the DEN include:

Selective Retraining: Rather than retraining the entire network, DEN identifies and retrains only the relevant subnetworks, significantly reducing computational overheads and improving task performance due to reduced negative transfer.
Dynamic Network Expansion: The network grows its capacity accordingly when new tasks cannot be sufficiently represented by the existing neurons. This is efficiently managed through group sparsity regularization, ensuring that only useful neurons are retained.
Network Split/Duplication: To overcome semantic drift, neurons that drastically change their meaning across tasks are duplicated, preserving the learned features specific to earlier tasks while allowing new tasks to reshape the network as needed.
Timestamped Inference: New units introduced at a particular task stage are timestamped, ensuring older tasks do not inadvertently use newer units, thereby preventing semantic drift.

Experimental Validation

The performance of DEN was assessed on multiple datasets including MNIST-Variation, CIFAR-100, and AWA. DEN demonstrated superior performance compared to other lifelong learning models, achieving nearly the same level of accuracy as batch-trained models but with a fraction of the network capacity. For instance, DEN used only 18.0%, 60.3%, and 11.9% of the parameters compared to DNN-STL on MNIST-Variation, CIFAR-100, and AWA respectively.

Moreover, fine-tuning DEN models on all tasks resulted in higher performance gains, confirming its utility in optimal structure estimation even in scenarios where batch training is an option.

Implications and Future Developments

The practical implications of DEN are substantial for dynamic real-world applications such as autonomous driving or robotics, where continuous learning from sequential tasks is critical. Theoretically, DEN introduces a robust paradigm for balancing between memory efficiency and learning capacity in neural networks.

Future research could delve into more sophisticated mechanisms for network expansion and pruning, ensuring continual improvement in both efficiency and performance. Additionally, exploring the integration of DEN with other forms of adaptive machine learning systems, such as reinforcement learning frameworks, could further enhance the adaptability and utility of lifelong learning models.

Through strategic innovations in selective retraining, dynamic network capacity expansion, and effective semantic drift prevention, DEN presents a significant advancement in the field of lifelong learning, validating its efficacy across diverse datasets and application scenarios.

PDF Markdown