- The paper introduces DEEP R, an algorithm that trains very sparse deep networks by dynamically rewiring connections during gradient descent.
- It employs a combination of gradient optimization and stochastic sampling to enforce connectivity constraints from the start of training.
- Experimental results on MNIST, CIFAR-10, and TIMIT show high accuracy with as little as 2% connectivity, outperforming conventional pruning methods.
Deep Rewiring: Training Very Sparse Deep Networks
The objective of this paper is to present an algorithm, DEEP R, for training deep neural networks with strict connectivity constraints, achieving efficient implementations both in generic and neuromorphic hardware. The authors establish a method that combines gradient-based optimization with stochastic sampling to dynamically rewire network connections during supervised training, ensuring that the total number of active connections is bounded throughout the process.
Motivation
The paper emphasizes the growing importance of sparse neural networks, particularly in the context of hardware constraints such as limited memory and energy consumption in TPUs and FPGAs. This is especially critical in neuromorphic systems where memory is even more constrained. Dense connectivity in large networks can lead to inefficient resource usage and is generally not sustainable. Existing pruning techniques, which reduce the network sparsity post-training, fail to stay within strict hardware constraints during the training phase. DEEP R introduces a novel approach to address this issue by enabling sparse connectivity right from the start.
Methodology
DEEP R operates under the intriguing principle of starting with a specified connection sparsity and dynamically rewiring the network based on task demands during training. Utilizing a theoretical basis inspired by natural neural processes, the algorithm imposes a stochastic sampling of network architectures alongside weight updates. The rewiring emerges as an integral part of the learning process rather than merely a post-training optimization step. The authors provide a detailed pseudo-code of DEEP R, which involves annotating each connection with a sign and applying a combined gradient descent and random walk in the parameter space. They use a soft version of the rectified linear transformation to maintain sparsity.
Experimental Validation
The paper reports strong empirical results across several benchmarks: MNIST, CIFAR-10, and the TIMIT dataset. In these experiments, DEEP R enabled significant reductions in connectivity with marginal performance drops. For example, on MNIST, DEEP R achieved over 96% accuracy using only 2% connectivity compared to fully connected networks. In comparison to state-of-the-art pruning techniques, DEEP R maintained or even improved performance, particularly under very sparse conditions.
Implications and Future Directions
The implications of this research are extensive in both theoretical and practical realms. Theoretically, the results suggest new insights into biological learning processes, particularly synaptic rewiring in the brain. Practically, DEEP R opens up new possibilities for efficient deep learning implementations on memory-constrained hardware. As the demand for real-time, on-device inference continues to rise, algorithms like DEEP R will be of paramount importance.
Future developments could explore adaptations of DEEP R for unsupervised or semi-supervised learning settings and investigate its integration with other optimization techniques to further increase memory efficiency. Additionally, refining the implementation to minimize computational overhead and exploring applications in more complex architectures and larger datasets will likely yield further performance improvements and broader applicability.
This paper emphasizes a critical shift in deep learning research towards developing models that not only perform well but are also adaptable to the rigorous demands of current and emerging computing paradigms.