Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Increasing Depth of Neural Networks for Life-long Learning (2202.10821v2)

Published 22 Feb 2022 in cs.LG

Abstract: Purpose: We propose a novel method for continual learning based on the increasing depth of neural networks. This work explores whether extending neural network depth may be beneficial in a life-long learning setting. Methods: We propose a novel approach based on adding new layers on top of existing ones to enable the forward transfer of knowledge and adapting previously learned representations. We employ a method of determining the most similar tasks for selecting the best location in our network to add new nodes with trainable parameters. This approach allows for creating a tree-like model, where each node is a set of neural network parameters dedicated to a specific task. The Progressive Neural Network concept inspires the proposed method. Therefore, it benefits from dynamic changes in network structure. However, Progressive Neural Network allocates a lot of memory for the whole network structure during the learning process. The proposed method alleviates this by adding only part of a network for a new task and utilizing a subset of previously trained weights. At the same time, we may retain the benefit of PNN, such as no forgetting guaranteed by design, without needing a memory buffer. Results: Experiments on Split CIFAR and Split Tiny ImageNet show that the proposed algorithm is on par with other continual learning methods. In a more challenging setup with a single computer vision dataset as a separate task, our method outperforms Experience Replay. Conclusion: It is compatible with commonly used computer vision architectures and does not require a custom network structure. As an adaptation to changing data distribution is made by expanding the architecture, there is no need to utilize a rehearsal buffer. For this reason, our method could be used for sensitive applications where data privacy must be considered.

Citations (8)

Summary

We haven't generated a summary for this paper yet.