Continual Learning of Context-dependent Processing in Neural Networks (1810.01256v3)

Published 29 Sep 2018 in cs.LG, cs.AI, and cs.CV

Abstract: Deep neural networks (DNNs) are powerful tools in learning sophisticated but fixed mapping rules between inputs and outputs, thereby limiting their application in more complex and dynamic situations in which the mapping rules are not kept the same but changing according to different contexts. To lift such limits, we developed a novel approach involving a learning algorithm, called orthogonal weights modification (OWM), with the addition of a context-dependent processing (CDP) module. We demonstrated that with OWM to overcome the problem of catastrophic forgetting, and the CDP module to learn how to reuse a feature representation and a classifier for different contexts, a single network can acquire numerous context-dependent mapping rules in an online and continual manner, with as few as $\sim$10 samples to learn each. This should enable highly compact systems to gradually learn myriad regularities of the real world and eventually behave appropriately within it.

Citations (251)

View on Semantic Scholar

Summary

The paper introduces Orthogonal Weights Modification (OWM) to overcome catastrophic forgetting in sequential task learning.
It integrates a Context-Dependent Processing (CDP) module that dynamically adjusts neural responses based on contextual cues.
The framework scales effectively across benchmarks, handling up to 100 tasks with minimal samples per task while achieving state-of-the-art performance.

Continual Learning of Context-dependent Processing in Neural Networks

The paper addresses a crucial limitation in contemporary deep neural networks (DNNs): their propensity to learn static, inflexible input-output mappings, which restricts their application in complex and dynamic environments where these mappings change with context. To overcome this, the authors propose a novel conceptual and methodological framework involving Orthogonal Weights Modification (OWM) combined with a Context-Dependent Processing (CDP) module.

Orthogonal Weights Modification

The OWM algorithm is designed to mitigate the problem of catastrophic forgetting—a common challenge in sequential task learning where acquiring new information degrades performance on previously learned tasks. OWM allows a network to learn new tasks by modifying its weights only in directions orthogonal to the subspace of the tasks it has already learned. This is operationalized through a projection matrix, which ensures that updates to the network for new tasks do not interfere destructively with existing knowledge.

Empirical validation was conducted on several benchmark tasks such as the shuffled and disjoint MNIST experiments. In these tests, OWM not only matched or exceeded the performance of existing continual learning methods, but demonstrated superior scalability by effectively handling up to 100 sequential tasks. Tests on more complex datasets like CIFAR-10, Chinese characters, and ImageNet further illustrated the method's scalability. Notably, the approach facilitated learning of a multitude of tasks with minimal data per task—sometimes as few as 10 samples—illuminating a path towards constructing more sample-efficient learning systems.

Context-Dependent Processing Module

While OWM provides the structural backbone for continual learning, the CDP module extends this foundation by integrating contextual information to enable dynamic mapping modifications according to context. This is achieved through a network architecture that simulates certain aspects of cognitive flexibility akin to primate brains, where contextual signals modulate sensory feature representation before classification. The CDP module effectively rotates input space allowing the same feature vector to lead to different network outputs based on context.

Testing CDP in face classification tasks showed the ability to handle 40 different context-dependent mapping rules using a single neural architecture, achieving performance comparable to conventional multi-task training paradigms. This highlights the module's potential for enabling compact systems to extract and apply diverse context-specific rules and processing paths in an efficient manner.

Implications and Future Directions

The proposed framework suggests significant implications for both theoretical and practical domains in AI. By closely emulating the flexibility of biological cognition, this approach promises advancement in AI systems' adaptability in real-world dynamic environments. Theoretical implications further include understanding fast concept formation in neural networks, mimicking biological synaptic compartmentalization, and potential contributions to cognitive neuroscience through neural representations exhibiting mixed selectivity.

Future research could explore broader integration of the CDP-OWM framework with reinforcement learning, accommodating purely unsupervised and more autonomous learning scenarios. Furthermore, exploring combinations with complementary learning systems could enhance the method's flexibility, fostering the development of AI capable of more sophisticated contextually aware, lifelong learning. Ultimately, this paper marks a step towards constructing versatile AI systems capable of adapting to a wide range of tasks with human-like generalization capabilities, representing an interdisciplinary leap across AI and cognitive science.