Conditional Channel Gated Networks for Task-Aware Continual Learning (2004.00070v1)

Published 31 Mar 2020 in cs.CV, cs.LG, and stat.ML

Abstract: Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input. This way, we achieve two appealing properties. Firstly, the execution patterns of the gates allow to identify and protect important filters, ensuring no loss in the performance of the model for previously learned tasks. Secondly, by using a sparsity objective, we can promote the selection of a limited set of kernels, allowing to retain sufficient model capacity to digest new tasks.Existing solutions require, at test time, awareness of the task to which each example belongs to. This knowledge, however, may not be available in many practical scenarios. Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. We validate our proposal on four continual learning datasets. Results show that our model consistently outperforms existing methods both in the presence and the absence of a task oracle. Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98% and 17.42% improvement in accuracy w.r.t. competing methods.

Citations (169)

View on Semantic Scholar

Summary

Conditional Channel Gated Networks for Task-Aware Continual Learning

The paper presents an innovative framework to address the issue of catastrophic forgetting in Convolutional Neural Networks (CNNs) during sequential learning of tasks. The authors propose Conditional Channel Gated Networks that employ task-specific gating modules to determine the activation of specific filters based on the input. This novel approach ensures that important filters for previously learned tasks are protected while maintaining sufficient model capacity for new tasks, even when task labels are not available during inference.

The key contribution of this work lies in the introduction of task-specific gating modules equipped within each convolutional layer. These gating modules dynamically select a subset of filters conditioned on the input feature map, thereby facilitating conditional computation and enabling efficient memory utilization. The framework also incorporates a sparsity objective that encourages the utilization of fewer units, optimizing the model's capacity for new tasks while preventing the overwriting of critical parameters associated with older tasks. The proposed architecture facilitates task prediction through a task classifier, eliminating the requirement for a task oracle at test time—a notable advancement over existing models in class-incremental learning scenarios.

A series of empirical validations on four continual learning datasets, including Split SVHN and Imagenet-50, underscores the efficacy of this approach. The model exhibits substantial improvements, with accuracy enhancements as large as 23.98% and 17.42% over competing methodologies in certain configurations. The experimental results reveal that the model effectively mitigates catastrophic forgetting, performing favorably against other state-of-the-art continual learning algorithms, including EWC-On, LwF, and HAT, across varying dataset complexities and learning scenarios.

The research addresses two continual learning settings: task-incremental, where task identifiers are available during both training and inference, and class-incremental, where task identifiers are unavailable during inference. The distinct advantage of the proposed approach materializes in its flexibility and robustness across these settings. For task-incremental learning, the gating mechanism allows the model to avoid interference between tasks. For class-incremental learning, a task classifier is utilized to predict task labels during inference, leveraging either episodic or generative memories to rehearse task-specific predictions with remarkable accuracy.

From a practical standpoint, the framework's ability to manage computational resources during forward propagation demonstrates significant efficiency improvements. The authors highlight that despite model expansion in a class-incremental setting, computational requirements do not exceed those of the backbone architecture. This makes the model not only scalable but also resource-efficient.

Theoretically, this work advances the understanding of conditional computation in neural networks, particularly in the context of lifelong learning. It sets a precedent for employing dynamic gating mechanisms coupled with sparsity-driven objectives to finesse model capacities strategically. Furthermore, it proposes a fresh angle of research in task classification through neural networks, extending beyond static scenario limitations.

Future research directions could explore optimizing the gating network to balance resource efficiency further, enhancing dynamic task prediction accuracy, or integrating similar conditional computing strategies within other neural architectures. Advances in these areas could substantially influence autonomous systems, enabling them to learn and adapt continuously without the constraints of task boundaries, thereby enhancing their practical applicability in the AI domain.

Related Papers

YouTube

Show All Videos