Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting (1904.00310v3)

Published 31 Mar 2019 in cs.LG and cs.CV

Abstract: Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks. Despite recent remarkable progress in state-of-the-art deep learning, deep neural networks (DNNs) are still plagued with the catastrophic forgetting problem. This paper presents a conceptually simple yet general and effective framework for handling catastrophic forgetting in continual learning with DNNs. The proposed method consists of two components: a neural structure optimization component and a parameter learning and/or fine-tuning component. By separating the explicit neural structure learning and the parameter estimation, not only is the proposed method capable of evolving neural structures in an intuitively meaningful way, but also shows strong capabilities of alleviating catastrophic forgetting in experiments. Furthermore, the proposed method outperforms all other baselines on the permuted MNIST dataset, the split CIFAR100 dataset and the Visual Domain Decathlon dataset in continual learning setting.

PDF Abstract

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

The paper introduces a novel framework entitled "Learn to Grow," designed to tackle the issue of catastrophic forgetting in continual learning settings. Catastrophic forgetting is a significant challenge in machine learning, wherein models lose previously acquired knowledge upon learning new tasks. This framework leverages a continual structure learning approach to mitigate this problem, demonstrated through various experimental results across different datasets.

Methodology

The "Learn to Grow" framework focuses on structural adaptations to neural networks as new tasks are introduced, preventing degradation in performance on previously learned tasks. By dynamically growing the network structure, the framework effectively manages parameter sharing and task-specific adaptations. The authors employ a balance between reusable parameters and task-specific expansions, ensuring optimal use of resources without compromising performance.

A key component of this method is the integration of a parameter loss function into the validation loss, penalizing additional parameter usage based on their contribution to overall model performance. The application of this function enables the control of parameter growth, helping maintain a compact model size while achieving competitive accuracy.

Experimental Evaluation

To validate their approach, the authors conducted extensive experiments using datasets such as permuted MNIST, split CIFAR-100, and the Visual Domain Decathlon (VDD) dataset. In particular, the experiments on permuted MNIST demonstrated the framework's proficiency in maintaining high performance even when the number of tasks increased significantly beyond typical settings. On the VDD dataset, the proposed method outperformed several baseline models across various tasks while maintaining a manageable model size. The results showed that their framework achieved the best results in five out of ten tasks, particularly excelling in tasks with smaller data sizes such as VGG-Flowers and Aircraft.

Furthermore, the scalability of the framework was tested through different parameter loss factor settings, where it was evident that an appropriate balance (obtained at a parameter scaling factor of 0.1) provided the best compromise between accuracy and model size management.

Implications and Future Work

This work presents significant implications for the design of neural networks in lifelong learning applications, where efficient use of model resources and robust performance across diverse tasks are essential. The ability to dynamically adjust network structures based on task requirements offers a practical solution to the prevalent issue of catastrophic forgetting.

Given the encouraging results demonstrated in this paper, future developments could explore further enhancements in adaptive network architectures and the refinement of parameter adjustment techniques. There is also potential in extending this framework to more complex tasks or broader continual learning environments, possibly incorporating reinforcement learning paradigms for improved task ordering strategies.

Overall, the "Learn to Grow" framework offers a valuable contribution to the domain of continual learning, providing insights into the effective management of neural network structures to counteract catastrophic forgetting. As AI systems continue to engage in increasingly complex and varied tasks, such methodological innovations will be critical in advancing the capability and reliability of these systems.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Xilai Li (15 papers)
Yingbo Zhou (81 papers)
Tianfu Wu (63 papers)
Richard Socher (115 papers)
Caiming Xiong (337 papers)

Citations (385)

View on Semantic Scholar

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting (1904.00310v3)