Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations (1810.07411v4)

Published 17 Oct 2018 in cs.NE and cs.LG

Abstract: Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications. However, training these models often relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the underlying training process difficult. Here, we propose the Parallel Temporal Neural Coding Network (P-TNCN), a biologically inspired model trained by the learning algorithm we call Local Representation Alignment. It aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. The architecture requires neither unrolling in time nor the derivatives of its internal activation functions. We compare our model and learning procedure to other back-propagation through time alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization. We show that it outperforms these on sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we denote as Bouncing NotMNIST, and Penn Treebank. Notably, our approach can in some instances outperform full back-propagation through time as well as variants such as sparse attentive back-tracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new datasets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior generative knowledge when faced with a task sequence. We present results that show the P-TNCN's ability to conduct zero-shot adaptation and online continual sequence modeling.

Citations (62)

Summary

We haven't generated a summary for this paper yet.