Training Sparse Neural Networks (1611.06694v1)

Published 21 Nov 2016 in cs.CV and cs.LG

Abstract: Deep neural networks with lots of parameters are typically used for large-scale computer vision tasks such as image classification. This is a result of using dense matrix multiplications and convolutions. However, sparse computations are known to be much more efficient. In this work, we train and build neural networks which implicitly use sparse computations. We introduce additional gate variables to perform parameter selection and show that this is equivalent to using a spike-and-slab prior. We experimentally validate our method on both small and large networks and achieve state-of-the-art compression results for sparse neural network models.

Authors (3)

Suraj Srinivas (28 papers)
Akshayvarun Subramanya (8 papers)
R. Venkatesh Babu (108 papers)

Citations (193)

View on Semantic Scholar

Summary

Training Sparse Neural Networks: An Analysis

The paper "Training Sparse Neural Networks" by Suraj Srinivas, Akshayvarun Subramanya, and R. Venkatesh Babu offers an innovative approach to constructing efficient neural networks by focusing on sparsity. The authors address the commonly observed challenge of over-parameterization in deep neural networks—particularly those used for large-scale tasks like image classification—by introducing methods that exploit sparse computations to achieve better performance.

Contributions and Methodology

The researchers introduce a novel regularization scheme specifically designed to limit the total number of parameters in a neural network, diverging from traditional magnitude-focused regularizers like $\ell_1$ and $\ell_2$ . The crux of their method involves the usage of gate variables to facilitate parameter selection, thus inducing sparsity. These gate variables function as Bernoulli parameters that determine whether a weight is zero or non-zero, promoting a sparsity akin to a spike-and-slab prior—often employed in Bayesian statistics for variable selection.

The complexity of a neural network is defined by the authors through its total parameter count, and they propose an optimization problem aiming to minimize both the network's loss function and its parameter complexity. They reformulate the problem by interpreting gate variables as stochastic and learning these binary parameters via an identity back-propagation strategy, known as the straight-through estimator. This methodology allows for both weight pruning and network training to occur simultaneously, enhancing computational efficiency.

Experimental Validation

Empirical investigations showcase the method's efficacy across small and large networks, notably on architectures like LeNet-5, AlexNet, and VGG-16, achieving compression rates up to 95.84% for LeNet-5. The experimental results indicate that the proposed method matches or surpasses the state-of-the-art in network sparsity while maintaining or slightly improving accuracy, as observed in a compression rate of 14x on VGG-16 with negligible accuracy degradation.

Hyperparameter sensitivity analysis underscores that, while the initialization of gate variables and the values of regularization parameters $\lambda_1$ and $\lambda_2$ influence the training dynamics, the method displays robustness to these factors, suggesting a degree of stability in achieving high sparsity with sufficient accuracy.

Practical Implications and Future Directions

The implementation indicates that training with sparse matrices can mitigate model size and computational requirements, paving the way for deployment in resource-constrained environments such as mobile and embedded systems. Future research might explore extending these sparsification techniques to real-time applications or adapting them for use in unsupervised or reinforcement learning contexts where model efficiency is critical.

Additionally, this work opens prospects for new directions in neural architecture search and optimization, where strategic pruning and growth can enhance learning throughput without fiercely compromising predictive performance—a recurring theme in resource-limited AI applications.

In conclusion, this paper provides a substantive contribution to the field of efficient neural network design, leveraging sparse computations to address traditional limitations of model size and computational demand. The methodologies presented could serve as foundational tools for developing more adaptive and scalable AI systems.

Related Papers

Find Related Papers