Learning Structured Sparsity in Deep Neural Networks (1608.03665v4)

Published 12 Aug 2016 in cs.NE, cs.LG, and stat.ML

Abstract: High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNNs evaluation. Experimental results show that SSL achieves on average 5.1x and 3.1x speedups of convolutional layer computation of AlexNet against CPU and GPU, respectively, with off-the-shelf libraries. These speedups are about twice speedups of non-structured sparsity; (3) regularize the DNN structure to improve classification accuracy. The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network (ResNet) to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers. For AlexNet, structure regularization by SSL also reduces the error by around ~1%. Open source code is in https://github.com/wenwei202/caffe/tree/scnn

Authors (5)

Wei Wen (49 papers)
Chunpeng Wu (12 papers)
Yandan Wang (8 papers)
Yiran Chen (176 papers)
Hai Li (159 papers)

Citations (2,248)

View on Semantic Scholar

Summary

The paper introduces Structured Sparsity Learning (SSL) as a novel regularization technique that optimizes filters, channels, filter shapes, and network depth in DNNs.
It employs group Lasso regularization to learn compact, hardware-friendly structures, achieving significant speedups such as 5.1× on CPUs and 3.1× on GPUs.
Experimental results on datasets like MNIST, CIFAR-10, and ImageNet demonstrate SSL’s capability to reduce computational operations while maintaining or improving accuracy.

Learning Structured Sparsity in Deep Neural Networks

The paper "Learning Structured Sparsity in Deep Neural Networks" authored by Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li introduces a method called Structured Sparsity Learning (SSL) aimed at optimizing deep neural networks (DNNs). This method systematically regularizes various structures within DNNs, such as filters, channels, filter shapes, and layer depth, to achieve computational efficiency and potential improvements in classification accuracy. The primary focus of this research is to address the computational and memory bottlenecks associated with deploying large-scale DNNs, especially on resource-constrained devices.

Key Contributions and Methodology

The paper identifies that traditional sparsity regularization and connection pruning methods often lead to non-structured, random connectivity within DNNs, which results in poor data locality and limited practical speed improvements. In contrast, SSL aims to maintain structured sparsity that not only enhances computational efficiency but also potentially boosts accuracy.

Key contributions of the SSL method include:

Learning Compact Structures: SSL learns compact and hardware-friendly structures in DNNs during the training phase by employing group Lasso regularization techniques. The focus is on filters, channels, and filter shapes within each layer and the overall structure of the depth beyond the layers.
Filter-Wise and Channel-Wise Sparsity: SSL penalizes unimportant filters and channels, effectively reducing the number of these elements in convolutional layers without sacrificing accuracy. Filters zeroed out by this method eliminate associated computational overhead.
Shape-Wise Sparsity: By enabling the learning of arbitrary filter shapes, SSL removes unnecessary computational baggage imposed by fixed, cuboid filter shapes traditionally used in DNNs.
Depth Regularization: SSL addresses the necessity of layers in deep networks by dynamically adjusting the depth based on structured sparsity. This addresses the issues related to exploding gradients and degradation problems present in deep networks.

Experimental Results and Analysis

The authors validate the effectiveness of SSL through comprehensive experiments on various datasets, including MNIST, CIFAR-10, and ImageNet, using different network architectures like LeNet, ConvNet, and ResNet.

Speedups and Accuracy: Across different benchmarks, SSL demonstrates significant computation speedups. For example, on AlexNet, SSL achieved an average speedup of 5.1× on CPUs and 3.1× on GPUs, compared to traditional non-structured sparsity methods.
Impact on Filters and Shapes: The method successfully regularized convolutional filters and shapes, learning smoother, more natural patterns that retained accuracy while reducing the number of necessary operations.
Depth-Wise Regularization: In the case of ResNet on CIFAR-10, SSL reduced the number of layers from 20 to 14 while maintaining or even improving accuracy, demonstrating the method's ability to balance depth and performance.
Comparison with Non-Structured Sparsity: By comparing SSL with approaches like connection pruning and $\ell_1$ -norm, SSL is shown to outperform these methods in terms of both speedup and manageable accuracy loss.

Implications and Future Directions

The SSL methodology has substantial practical implications for the domains where DNN deployment is constrained by computational resources. Key implications include:

Hardware Efficiency: SSL’s ability to regularize DNNs into more compact structures paves the way for more efficient deployment on a wider range of hardware platforms, potentially accelerating inference tasks in edge computing and mobile devices.
Model Compression: The method's capacity to work with low rank approximation techniques further enhances its utility in model compression scenarios, making it a valuable tool for optimizing large-scale DNNs.
Regularization as Accuracy Improvement: Beyond just efficiency gains, the regularization aspect of SSL can also indirectly contribute to enhanced generalization and reduced overfitting, which is critical in many real-world applications of AI.

Future work could delve into extending SSL methods to other types of neural networks and investigating the integration of SSL with advanced hardware optimization techniques. Additionally, research could focus on automated hyper-parameter tuning for SSL to further minimize human intervention in optimizing DNN architectures.

In conclusion, the Structured Sparsity Learning method presents pertinent advancements in the optimization of deep neural networks, addressing key issues related to computational efficiency and model accuracy. The robust experimental results corroborate the method’s efficacy, marking a vital contribution to the field of deep learning and model compression.

PDF Markdown

Related Papers

GitHub

GitHub - wenwei202/caffe at scnn