Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization (1802.01569v2)

Published 2 Feb 2018 in cs.LG, cs.AI, and q-bio.NC

Abstract: Humans and most animals can learn new tasks without forgetting old ones. However, training artificial neural networks (ANNs) on new tasks typically cause it to forget previously learned tasks. This phenomenon is the result of "catastrophic forgetting", in which training an ANN disrupts connection weights that were important for solving previous tasks, degrading task performance. Several recent studies have proposed methods to stabilize connection weights of ANNs that are deemed most important for solving a task, which helps alleviate catastrophic forgetting. Here, drawing inspiration from algorithms that are believed to be implemented in vivo, we propose a complementary method: adding a context-dependent gating signal, such that only sparse, mostly non-overlapping patterns of units are active for any one task. This method is easy to implement, requires little computational overhead, and allows ANNs to maintain high performance across large numbers of sequentially presented tasks when combined with weight stabilization. This work provides another example of how neuroscience-inspired algorithms can benefit ANN design and capability.

PDF Abstract

Alleviating Catastrophic Forgetting in Neural Networks Through Context-Dependent Gating and Synaptic Stabilization

The paper "Alleviating Catastrophic Forgetting using Context-Dependent Gating and Synaptic Stabilization" explores a critical challenge in training artificial neural networks (ANNs): catastrophic forgetting. This phenomenon occurs when sequential learning of new tasks leads to the erosion of performance on previously trained tasks. The authors offer a dual-component approach inspired by neuroscientific principles to mitigate this challenge: context-dependent gating (XdG) and synaptic stabilization.

Methodological Approach

The paper leverages a multi-faceted strategy by integrating synaptic stabilization techniques with a novel context-dependent gating mechanism. Synaptic stabilization here refers to methods like Synaptic Intelligence (SI) and Elastic Weight Consolidation (EWC) aimed at preserving the weights critical for previously learned tasks. The stabilization is achieved by minimizing synaptic changes through a quadratic penalty on weight updates, guided by an importance measure derived from task error gradients.

The complementary method of context-dependent gating involves minimizing interference between tasks by activating sparsely connected, non-overlapping subsets of neurons for each task. This strategy restricts the adjustability to specific sub-networks, thereby preserving the integrity of weights dedicated to prior tasks. This dual approach is tested extensively on various neural network architectures including feedforward and recurrent networks under supervised and reinforcement learning paradigms.

Empirical Results

The efficacy of the proposed approach is validated through experiments on standard perpetual learning benchmarks — the permuted MNIST challenge and the ImageNet dataset divided into sequential tasks. For the permuted MNIST tasks, the amalgamation of XdG with synaptic stabilization allowed ANN to maintain a high mean accuracy (~95.4%) over 100 sequentially trained tasks. The effectiveness of this approach was further confirmed with the ImageNet dataset, where the accuracy significantly improved even with changing task domains.

Recurrent networks trained on cognitive neuroscience-inspired tasks showed significant improvements in task retention using the XdG plus stabilization framework, with high accuracy scores achieved even when switching training paradigms from supervised to reinforcement learning.

Theoretical and Practical Implications

The ensemble of context-dependent gating with weight stabilization not only addresses catastrophic forgetting but also aligns with the principle of continual learning found in biological brains. From a theoretical standpoint, it suggests that employing multifaceted approaches, akin to natural systems, rather than singular solutions holds promise for mitigating task interference.

Practical implications of this research span optimizing training protocols for ANNs in dynamically evolving environments where the ability to retain past knowledge while acquiring new information efficiently is paramount. The simplicity and computational efficiency of the XdG approach, without necessitating architectural changes or substantial overhead, increase its potential for real-world applications.

Future Directions

Continued development in this area could see further refinement in the gating mechanisms to potentially enhance transfer learning. An interdisciplinary approach incorporating insights from computational neuroscience can deliver innovative solutions for modular task representation in neural networks, emphasizing algorithm adaptability to achieve seamless transitions between related context tasks.

Overall, this paper lays a strong foundation for leveraging biologically inspired strategies to promote sustainability in learning systems, opening venues to extend these methodologies to diverse AI-driven applications.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Nicolas Y. Masse (1 paper)
Gregory D. Grant (7 papers)
David J. Freedman (3 papers)

Citations (264)

View on Semantic Scholar