Continual Learning via Neural Pruning (1903.04476v1)

Published 11 Mar 2019 in cs.LG, cs.NE, q-bio.NC, and stat.ML

Abstract: We introduce Continual Learning via Neural Pruning (CLNP), a new method aimed at lifelong learning in fixed capacity models based on neuronal model sparsification. In this method, subsequent tasks are trained using the inactive neurons and filters of the sparsified network and cause zero deterioration to the performance of previous tasks. In order to deal with the possible compromise between model sparsity and performance, we formalize and incorporate the concept of graceful forgetting: the idea that it is preferable to suffer a small amount of forgetting in a controlled manner if it helps regain network capacity and prevents uncontrolled loss of performance during the training of future tasks. CLNP also provides simple continual learning diagnostic tools in terms of the number of free neurons left for the training of future tasks as well as the number of neurons that are being reused. In particular, we see in experiments that CLNP verifies and automatically takes advantage of the fact that the features of earlier layers are more transferable. We show empirically that CLNP leads to significantly improved results over current weight elasticity based methods.

Citations (150)

View on Semantic Scholar

Summary

The paper introduces CLNP, a novel framework that leverages inactive neurons to avoid interference and eliminate catastrophic forgetting.
It employs an L1 regularizer and post-training sparsification to maintain high accuracy while freeing network capacity for new tasks.
Empirical results on permuted MNIST and split CIFAR datasets demonstrate CLNP’s superior performance in managing sequentially learned tasks.

Continual Learning via Neural Pruning: An Expertise-Driven Overview

The paper "Continual Learning via Neural Pruning" by Golkar, Kagan, and Cho presents the Continual Learning via Neural Pruning (CLNP) method, proposing a novel approach for lifelong learning in fixed capacity neural networks. This method addresses the prevalent issue of catastrophic forgetting in continual learning, taking advantage of the over-parametrization characteristic of neural networks through a systematic sparsification process.

Core Contributions and Methodology

The CLNP framework operates by sequentially training new tasks using inactive neurons and filters within a network that has been sparsified after each task. Unlike other techniques that rely on weight elasticity to prevent forgetting, CLNP guarantees zero catastrophic forgetting by ensuring no interference between the tasks. A unique aspect of this method is the implementation of "graceful forgetting," where a controlled, minimal degradation in model performance is considered acceptable if it enhances network capacity for future tasks.

The approach involves training a given network with activation-based neuron sparsity, utilizing unused weights of the network to introduce new tasks without compromising the performance of previously learned tasks. The method categorizes neurons and their weights into active, free, and interference components, allowing for a non-destructive training approach that leverages previously learned features. The structure of CLNP provides diagnostic tools that measure the capacity usage of the network's neurons, particularly emphasizing the transferability of features from earlier layers.

One key technical component of CLNP is its sparsification scheme, which employs an $L^1$ weight regularizer and post-training neuron pruning based on average activity. This ensures that neural network compression is performed efficiently, involving a clear decision-making process on which neurons and connections remain active post-training, thus furiously managing network resources.

Numerical Results and Analysis

The empirical validation of CLNP spans several tasks derived from benchmark datasets, including permuted MNIST and split CIFAR-10/CIFAR-100. In the case of permuted MNIST, CLNP demonstrated superior performance, achieving 98.42% accuracy, which closely aligns with the single-task performance and outperforms existing methodologies.

In the split CIFAR-10/CIFAR-100 experiments, the method showed considerable improvement over existing approaches, with multi-head architecture delivering better-than-state-of-the-art results. An extension of the approach incorporating fine-tuning further reduced performance degradation due to catastrophic forgetting, showcasing the potential of enhanced sparsification techniques.

The results highlight the importance of strategic network architecture when applying CLNP, particularly emphasizing the need for higher-width layers to accommodate tasks without degrading performance noticeably. This strategy underscores the significance of transferability and feature utilization from previous tasks, a theme recurrent in continual learning literature.

Implications and Future Directions

The implications of CLNP are substantial within the AI community, addressing a primary obstacle in the path toward efficient continual learning. This method not only offers a promising solution to the issue of catastrophic forgetting but also paves the way for more informed and diagnostic-driven network management in practical applications. The framework's adaptability in deployment provides a robust foundation for extending its principles to broader architectures and more complex learning paradigms.

Future developments could focus on exploring different sparsification schemes to further reduce memory and computational costs while maintaining, or even improving, performance metrics. Additionally, scaling the CLNP approach to work with large scale LLMs or domain-specific applications could provide insights and improvements relevant to specialized AI deployments.

In essence, Continual Learning via Neural Pruning introduces substantial potential to refine machine learning models' adaptability and sustainability, contributing to the broader objective of developing AI systems that can competently learn and generalize across diverse, sequentially-presented tasks.

PDF Markdown

Related Papers

YouTube

Show All Videos