Deep Rewiring: Training very sparse deep networks (1711.05136v5)

Published 14 Nov 2017 in cs.NE, cs.AI, cs.DC, cs.LG, and stat.ML

Abstract: Neuromorphic hardware tends to pose limits on the connectivity of deep networks that one can run on them. But also generic hardware and software implementations of deep learning run more efficiently for sparse networks. Several methods exist for pruning connections of a neural network after it was trained without connectivity constraints. We present an algorithm, DEEP R, that enables us to train directly a sparsely connected neural network. DEEP R automatically rewires the network during supervised training so that connections are there where they are most needed for the task, while its total number is all the time strictly bounded. We demonstrate that DEEP R can be used to train very sparse feedforward and recurrent neural networks on standard benchmark tasks with just a minor loss in performance. DEEP R is based on a rigorous theoretical foundation that views rewiring as stochastic sampling of network configurations from a posterior.

Citations (260)

View on Semantic Scholar

Summary

The paper introduces DEEP R, an algorithm that trains very sparse deep networks by dynamically rewiring connections during gradient descent.
It employs a combination of gradient optimization and stochastic sampling to enforce connectivity constraints from the start of training.
Experimental results on MNIST, CIFAR-10, and TIMIT show high accuracy with as little as 2% connectivity, outperforming conventional pruning methods.

Deep Rewiring: Training Very Sparse Deep Networks

The objective of this paper is to present an algorithm, DEEP R, for training deep neural networks with strict connectivity constraints, achieving efficient implementations both in generic and neuromorphic hardware. The authors establish a method that combines gradient-based optimization with stochastic sampling to dynamically rewire network connections during supervised training, ensuring that the total number of active connections is bounded throughout the process.

Motivation

The paper emphasizes the growing importance of sparse neural networks, particularly in the context of hardware constraints such as limited memory and energy consumption in TPUs and FPGAs. This is especially critical in neuromorphic systems where memory is even more constrained. Dense connectivity in large networks can lead to inefficient resource usage and is generally not sustainable. Existing pruning techniques, which reduce the network sparsity post-training, fail to stay within strict hardware constraints during the training phase. DEEP R introduces a novel approach to address this issue by enabling sparse connectivity right from the start.

Methodology

DEEP R operates under the intriguing principle of starting with a specified connection sparsity and dynamically rewiring the network based on task demands during training. Utilizing a theoretical basis inspired by natural neural processes, the algorithm imposes a stochastic sampling of network architectures alongside weight updates. The rewiring emerges as an integral part of the learning process rather than merely a post-training optimization step. The authors provide a detailed pseudo-code of DEEP R, which involves annotating each connection with a sign and applying a combined gradient descent and random walk in the parameter space. They use a soft version of the rectified linear transformation to maintain sparsity.

Experimental Validation

The paper reports strong empirical results across several benchmarks: MNIST, CIFAR-10, and the TIMIT dataset. In these experiments, DEEP R enabled significant reductions in connectivity with marginal performance drops. For example, on MNIST, DEEP R achieved over 96% accuracy using only 2% connectivity compared to fully connected networks. In comparison to state-of-the-art pruning techniques, DEEP R maintained or even improved performance, particularly under very sparse conditions.

Implications and Future Directions

The implications of this research are extensive in both theoretical and practical realms. Theoretically, the results suggest new insights into biological learning processes, particularly synaptic rewiring in the brain. Practically, DEEP R opens up new possibilities for efficient deep learning implementations on memory-constrained hardware. As the demand for real-time, on-device inference continues to rise, algorithms like DEEP R will be of paramount importance.

Future developments could explore adaptations of DEEP R for unsupervised or semi-supervised learning settings and investigate its integration with other optimization techniques to further increase memory efficiency. Additionally, refining the implementation to minimize computational overhead and exploring applications in more complex architectures and larger datasets will likely yield further performance improvements and broader applicability.

This paper emphasizes a critical shift in deep learning research towards developing models that not only perform well but are also adaptable to the rigorous demands of current and emerging computing paradigms.

PDF Markdown