Structured Bayesian Pruning via Log-Normal Multiplicative Noise (1705.07283v2)

Published 20 May 2017 in stat.ML

Abstract: Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is computed in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer.

Citations (188)

View on Semantic Scholar

Summary

The paper’s main contribution is a Bayesian model that induces structured sparsity by applying log-normal multiplicative noise to entire neurons or channels.
It leverages a closed-form KL-divergence in a variational inference framework, enabling efficient pruning and computational acceleration.
Experiments on architectures like LeNet and VGG on MNIST and CIFAR-10 show substantial compression and speedup with negligible accuracy drop.

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

The paper "Structured Bayesian Pruning via Log-Normal Multiplicative Noise" addresses a significant challenge in deep neural network (DNN) design: the balance between model complexity and computational efficiency. The authors introduce a novel Bayesian model that induces structured sparsity in neural networks, thereby enhancing computational efficiency without adversely impacting model accuracy.

Core Contribution and Methodology

The core contribution of the paper is a structured sparsity-inducing model based on Bayesian principles. This approach leverages dropout-based regularization, extending it beyond traditional methods by incorporating truncated log-normal noise. The key advancement lies in its ability to remove entire neurons or convolutional channels, which allows for structured sparsity as opposed to the unstructured sparsity commonly achieved with existing dropout techniques.

Probabilistic Model and Inference:
- The paper introduces a probabilistic model using a truncated log-uniform prior and log-normal posterior approximation to manage the noise applied to neuron outputs. This ensures the KL-divergence term in the Variational Bayesian framework is computable in closed-form.
- The Variational Inference process optimizes the Bayesian model, leveraging the stochastic gradient variational Bayes (SGVB) estimator to efficiently train DNNs while continually updating noise parameters.
Structured Sparsity and Acceleration:
- By controlling noise variables at the neuron or channel level, this approach encourages the elimination of elements with a low Signal-to-Noise Ratio (SNR), directly resulting in a simplified computation graph.
- This structured sparsity translates to significant computational acceleration, facilitating real-time applications in resource-constrained environments.
Flexibility and Implementation:
- The proposed dropout-like layer is modular and can be easily implemented and integrated with existing neural networks.
- It allows for scalable adjustments in parameters like mean and variance, enhancing the adaptability and performance of the models.

Experimental Evaluation

The authors validate their approach using experiments on common architectures like LeNet and VGG-like networks on datasets such as MNIST and CIFAR-10. The results demonstrate that the proposed method achieves substantial compression and acceleration. For instance, the model shows a high degree of structured sparsity with negligible accuracy drop. Notably, experiments reveal that optimizing both the mean and variance of the multiplicative noise yields tighter variational bounds and enhanced sparsity.

Comparison with Other Techniques:
- When juxtaposed with other sparsity-inducing techniques like Sparse Variational Dropout (SparseVD) and Structured Sparsity Learning (SSL), the proposed method exhibits superior speedup and lower memory usage while maintaining competitive or better accuracy rates.
Avoidance of Overfitting:
- A compelling finding is that, unlike conventional regularization methods such as binary dropout, the proposed Bayesian approach intrinsically avoids overfitting on datasets with random labels, affirming its robustness.

Implications and Future Directions

The implications of Structured Bayesian Pruning are substantial, particularly for deploying DNNs on edge devices where computational power and memory are limited. This approach provides a pathway to balance architectural efficiency with model performance, potentially influencing a shift in model selection strategies towards more Bayesian-influenced designs.

Future Work and Speculation:

Further exploration into adjusting the scale of KL-divergence for individual layers could yield even better sparsity and acceleration results.
Expanding this methodology to other domains within deep learning, such as recurrent neural networks (RNNs) or transformers, might uncover additional efficiencies.
As machine learning models grow in complexity, integrating such structured pruning techniques could play a crucial role in ensuring sustainable AI development.

The paper ultimately provides a rigorous framework with practical advantages for optimizing neural network architectures, thereby making significant contributions to both theoretical and applied aspects of deep learning research.

PDF Markdown

Related Papers

YouTube

Show All Videos