Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

97 tokens/sec

GPT-4o

53 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Scaling Supervised Local Learning with Augmented Auxiliary Networks (2402.17318v1)

Published 27 Feb 2024 in cs.NE, cs.CV, and cs.LG

Abstract: Deep neural networks are typically trained using global error signals that backpropagate (BP) end-to-end, which is not only biologically implausible but also suffers from the update locking problem and requires huge memory consumption. Local learning, which updates each layer independently with a gradient-isolated auxiliary network, offers a promising alternative to address the above problems. However, existing local learning methods are confronted with a large accuracy gap with the BP counterpart, particularly for large-scale networks. This is due to the weak coupling between local layers and their subsequent network layers, as there is no gradient communication across layers. To tackle this issue, we put forward an augmented local learning method, dubbed AugLocal. AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy. We also propose to linearly reduce the depth of auxiliary networks as the hidden layer goes deeper, ensuring sufficient network capacity while reducing the computational cost of auxiliary networks. Our extensive experiments on four image classification datasets (i.e., CIFAR-10, SVHN, STL-10, and ImageNet) demonstrate that AugLocal can effectively scale up to tens of local layers with a comparable accuracy to BP-trained networks while reducing GPU memory usage by around 40%. The proposed AugLocal method, therefore, opens up a myriad of opportunities for training high-performance deep neural networks on resource-constrained platforms.Code is available at https://github.com/ChenxiangMA/AugLocal.

References (59)

Authors (4)

Chenxiang Ma (12 papers)
Jibin Wu (42 papers)
Chenyang Si (36 papers)
Kay Chen Tan (83 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper demonstrates AugLocal's novel method that leverages augmented auxiliary networks to bridge the performance gap with backpropagation.
It introduces a pyramidal structure that couples hidden and downstream layers for efficient, parallelized training.
Empirical evaluations on CIFAR-10, SVHN, STL-10, and ImageNet confirm comparable accuracy with reduced GPU memory usage.

Enhancing Supervised Local Learning with Augmented Auxiliary Networks for Deep Neural Architectures

Introduction

The evolution of artificial neural networks, specifically deep learning models, notably so in pattern recognition tasks, is predominantly underpinned by the backpropagation algorithm (BP). Despite its widespread application, BP's biological implausibility and inefficiencies, such as the requirement for substantial memory consumption and the update locking problem, have motivated the exploration of alternative training methodologies. Local learning, wherein each layer of the neural network is updated independently, emerges as a viable solution, sidestepping the pitfalls of BP.

Supervised Local Learning: A Primer

Conventional supervised local learning methods operate by employing gradient-isolated auxiliary networks that facilitate the independent optimization of each hidden layer. This approach inherently circumvents the update locking issue prevalent in BP, enabling more efficient parallelization of the training process. However, the granularization into independent layer-wise optimization procedures has historically led to a not insignificant performance gap compared to traditional BP, chiefly due to the absence of inter-layer gradient communication.

AugLocal: A Novel Approach

Addressing the aforementioned limitations, this paper presents AugLocal - a method that innovatively fortifies the coupling between a hidden layer and its subsequent layers through the strategic construction of auxiliary networks. This is achieved by selectively incorporating a subset of layers from the primary network into each hidden layer’s auxiliary network, thereby promoting feature representations beneficial for layers downstream. The method adopts a pyramidal structure, linearly decreasing the depth of auxiliary networks for deeper hidden layers, optimizing both accuracy and computational efficiency.

Empirical Validation

Extensive evaluations across multiple benchmarks, including CIFAR-10, SVHN, STL-10, and ImageNet, illustrate AugLocal's capability to significantly reduce the performance disparity to BP-trained networks. Achieving comparable accuracies while concurrently realizing a notable decrease in GPU memory usage (up to 40%), AugLocal's efficacy is underscored across various network architectures, including ResNet, VGG, MobileNet, EfficientNet, and RegNet. This universality accentuates AugLocal's potential as a scalable local learning rule adaptable to a wide array of deep learning tasks and architectures.

Theoretical Implications and Future Directions

AugLocal's approach of constructing auxiliary networks not only demonstrates a scalable solution to the challenges of supervising local learning in large-scale networks but also lays a foundational premise for future exploration in the domain. The analysis of hidden representations learned through AugLocal compared to BP provides insightful revelations into the compositional dynamics at play, giving rise to avenues for further empirical and theoretical inquiry into local learning mechanisms and their optimization.

Practical Considerations and Advancements

In the practical field, AugLocal paves the way for the deployment of high-performance deep neural networks on resource-constrained platforms, manifesting a methodological shift from traditional BP. The notable reduction in memory footprint offers tangible benefits for applications where resource allocation is a critical concern, such as edge computing and mobile deployments.

Concluding Remarks

In summary, AugLocal represents a significant methodological advancement in the field of supervised local learning. By effectively bridging the performance gap to BP, reducing computational overheads, and offering a scalable solution across various datasets and neural architectures, AugLocal marks a pivotal point in the exploration of alternative neural network training paradigms. The implications of this work not only resonate with the immediate sphere of deep learning research but also extend to the broader discourse on efficient, scalable, and biologically plausible learning algorithms.

PDF Markdown

Tweets

https://twitter.com/dippatel1994/status/1762866849107104226