Data-Driven Sparse Structure Selection for Deep Neural Networks (1707.01213v3)

Published 5 Jul 2017 in cs.CV, cs.LG, and cs.NE

Abstract: Deep convolutional neural networks have liberated its extraordinary power on various tasks. However, it is still very challenging to deploy state-of-the-art models into real-world applications due to their high computational complexity. How can we design a compact and effective network without massive experiments and expert knowledge? In this paper, we propose a simple and effective framework to learn and prune deep models in an end-to-end manner. In our framework, a new type of parameter -- scaling factor is first introduced to scale the outputs of specific structures, such as neurons, groups or residual blocks. Then we add sparsity regularizations on these factors, and solve this optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method. By forcing some of the factors to zero, we can safely remove the corresponding structures, thus prune the unimportant parts of a CNN. Comparing with other structure selection methods that may need thousands of trials or iterative fine-tuning, our method is trained fully end-to-end in one training pass without bells and whistles. We evaluate our method, Sparse Structure Selection with several state-of-the-art CNNs, and demonstrate very promising results with adaptive depth and width selection.

PDF Abstract

Data-Driven Sparse Structure Selection for Deep Neural Networks

The paper "Data-Driven Sparse Structure Selection for Deep Neural Networks" by Zehao Huang and Naiyan Wang addresses a critical issue in the deployment of deep convolutional neural networks (CNNs): their high computational complexity. The paper proposes a novel framework to design an efficient network that balances computational cost and performance without extensive manual interventions typical of traditional model optimization processes.

Methodology Overview

The authors introduce a sparse structure selection technique for deep neural networks, leveraging a new parameter called the scaling factor. This factor scales the outputs of different neural structures such as neurons, groups, or residual blocks. Sparsity regularizations are applied to these scaling factors during training, and the model optimization is performed using a modified stochastic Accelerated Proximal Gradient (APG) method. Sparse scaling factors are induced towards zero, leading to the pruning of corresponding structures within the neural network, effectively reducing the model's complexity.

Unlike other approaches that require labor-intensive pruning and fine-tuning procedures, their method prunes models in a single training pass, theoretically pointing to a streamlined model development workflow. The framework claims adaptability, allowing it to tailor the model configuration relative to the task requirements and load.

Empirical Evaluation

The effectiveness of the framework is evaluated across several state-of-the-art CNN architectures including PeleeNet, VGG, ResNet, and ResNeXt. The results demonstrate significant improvements in computational efficiency:

VGG Models: Achieved approximately 30-50% reduction in FLOPs and parameter count on CIFAR datasets with marginal loss in accuracy.
ResNet and ResNeXt Models: Delivered up to 2.5x acceleration on CIFAR datasets and showed promising results in pruning experiments on ImageNet with ResNet-50 and ResNeXt-50 models.
Adaptively pruned network structures that enhanced the balance of depth and width according to dataset-specific constraints.

Discussion and Implications

The authors’ approach signifies a substantial stride in model efficiency, making CNNs potentially more applicable in computationally constrained environments such as real-time systems and embedded devices. The paper indicates that this method can potentially guide the design of new compact CNN architectures, serving a dual purpose of simplifying existing networks and inspiring innovative designs.

Additionally, future extensions could include diversifying sparsity techniques with non-convex regularizers, which may further improve the model pruning efficacy. Such enhancements could be critical in expanding applications beyond image classification to object detection and other domains where model complexity constraints are paramount.

In terms of practical and theoretical implications, the paper proposes a balanced scheme of accuracy retention and computational reduction, which is critical in contexts where resources must be conserved without major sacrifices in performance.

Conclusion

"Data-Driven Sparse Structure Selection for Deep Neural Networks" offers notable contributions towards simplifying deep neural model deployment, particularly focusing on efficient pruning and structural learning. The method's success on large-scale datasets like ImageNet underscores its potential for broad applicability. The framework sets a meaningful precedent for future research focused on adaptable, data-driven model optimization in deep learning.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Zehao Huang (20 papers)
Naiyan Wang (65 papers)

Citations (541)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - huangzehao/sparse-structure-selection (38 stars)