Training Neural Networks with Local Error Signals (1901.06656v2)

Published 20 Jan 2019 in stat.ML, cs.CV, and cs.LG

Abstract: Supervised training of neural networks for classification is typically performed with a global loss function. The loss function provides a gradient for the output layer, and this gradient is back-propagated to hidden layers to dictate an update direction for the weights. An alternative approach is to train the network with layer-wise loss functions. In this paper we demonstrate, for the first time, that layer-wise training can approach the state-of-the-art on a variety of image datasets. We use single-layer sub-networks and two different supervised loss functions to generate local error signals for the hidden layers, and we show that the combination of these losses help with optimization in the context of local learning. Using local errors could be a step towards more biologically plausible deep learning because the global error does not have to be transported back to hidden layers. A completely backprop free variant outperforms previously reported results among methods aiming for higher biological plausibility. Code is available https://github.com/anokland/local-loss

Citations (216)

View on Semantic Scholar

Summary

The paper's main contribution is proposing a layer-wise training method using local loss functions instead of global backpropagation.
It demonstrates that combining local cross-entropy and similarity matching losses (predsim loss) achieves competitive performance on benchmarks like CIFAR-10.
The approach offers a biologically plausible alternative that mitigates backward-locking and improves memory efficiency through parallel weight updates.

Training Neural Networks with Local Error Signals

The paper "Training Neural Networks with Local Error Signals," authored by Arild Nøkland and Lars H. Eidnes, explores an alternative approach to supervised training of neural networks, diverging from the conventional use of a global loss function. Historically, neural networks for classification tasks have been trained using global backpropagation of error signals derived from a single, overarching loss function. This approach, while effective, suffers from limitations such as backward-locking and memory inefficiencies, and lacks biological plausibility. The authors propose a layer-wise training approach using local error signals that address these issues and achieve competitive performance across several challenging datasets.

Methodology and Approach

The primary innovation presented in this paper is the use of layer-wise training with local loss functions. The approach eschews the traditional backpropagation of a global error by employing local classifiers and two different supervised loss functions—local cross-entropy (prediction loss) and similarity matching loss—to generate local error signals. Depending on the architecture and dataset, these losses are either used independently or combined to form a predsim loss, which has demonstrated superior performance.

Layer-wise training offers several pragmatic benefits including mitigation of the backward-locking problem, allowing for memory reuse, and enabling parallel updates of hidden layer weights during the forward pass. The local errors can also facilitate greedy layer-wise training, reducing memory footprints even more.

A key aspect of the proposed method involves a similarity matching loss function—borrowed from neuroscience-inspired similarity analysis—which encourages representations of distinct classes to exhibit maximal dissimilarity. This loss operates by measuring the pairwise cosine similarity between examples within each mini-batch, and when combined with prediction loss, it optimizes both the distinctiveness and clustering of class representations.

Empirical Validation

The authors validated their approach across a variety of image classification datasets including MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN. They employed a range of neural network architectures from fully connected networks to more sophisticated VGG-like architectures. Notably, predsim loss achieved test errors highly competitive with global backpropagation and often superior results when compared to other local training strategies.

One noteworthy result is the performance of the backpropagation-free variant of the method, especially on CIFAR-10, where a test error of 7.8% was reached without traditional backpropagation mechanisms. This variant aligns more closely with biological plausibility by avoiding weight transport problems and employing local Hebbian-like learning principles.

Implications and Future Directions

This research paves the way for more biologically plausible and efficient training algorithms for neural networks. By removing the dependency on global backpropagation, the approach opens possibilities for novel, scalable training strategies suitable for large-scale and real-time applications, potentially utilizing parallelized and distributed computing resources more effectively.

The implications extend into theoretical realms by offering insights into alternate optimization landscapes and network architectures that prioritize generalization capabilities over mere minimization of training error. Future work might continue to explore how these layer-wise strategies could scale further, potentially to even more complex datasets like ImageNet, and how they may enhance robustness and generalization in real-world applications.

In summary, by demonstrating competitive results with local error signals, this work questions traditional constraints in neural network training and suggests a paradigm shift that could align more closely with neurobiological processes and pave the way for advancements in both machine learning efficiency and performance.

PDF Markdown

Related Papers

GitHub

GitHub - anokland/local-loss: PyTorch code for training neural networks without global back-propagation (160 stars)

Tweets

https://twitter.com/tsauri_eecs/status/1160567924483956736