Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch (2102.04010v2)

Published 8 Feb 2021 in cs.CV and cs.AR

Abstract: Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple individual weights distributed across the neural network, and structured coarse-grained sparsity which prunes blocks of sub-networks of a neural network. Fine-grained sparsity can achieve a high compression ratio but is not hardware friendly and hence receives limited speed gains. On the other hand, coarse-grained sparsity cannot concurrently achieve both apparent acceleration on modern GPUs and decent performance. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on specifically designed GPUs. Specifically, a 2:4 sparse network could achieve 2x speed-up without performance drop on Nvidia A100 GPUs. Furthermore, we propose a novel and effective ingredient, sparse-refined straight-through estimator (SR-STE), to alleviate the negative influence of the approximated gradients computed by vanilla STE during optimization. We also define a metric, Sparse Architecture Divergence (SAD), to measure the sparse network's topology change during the training process. Finally, We justify SR-STE's advantages with SAD and demonstrate the effectiveness of SR-STE by performing comprehensive experiments on various tasks. Source codes and models are available at https://github.com/NM-sparsity/NM-sparsity.

PDF Abstract

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

The paper "Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch" addresses the pressing challenge of efficiently training deep neural networks (DNNs) in resource-constrained environments. The authors propose a novel approach to construct neural networks with N:M fine-grained structured sparsity, defined as networks in which only N weights are non-zero for every M consecutive weights. This methodology aims to strike a balance between the compression benefits of unstructured fine-grained sparsity and the performance efficiency of structured coarse-grained sparsity, particularly on custom hardware such as Nvidia A100 GPUs.

Context and Challenges

Deep neural networks, while powerful, often require immense computational resources due to their vast number of parameters. This poses challenges for deploying such models in real-world applications where computational resources are limited. Sparsity techniques offer a solution by reducing the number of active parameters, but they come with trade-offs. Unstructured sparsity allows high compression but lacks hardware optimization, leading to insignificant speed-ups. In contrast, structured sparsity is more hardware-friendly but tends to degrade model performance substantially.

Approach and Methodology

This research undertakes the first in-depth exploration of training N:M structured sparse networks from scratch without significant performance degradation. By leveraging 2:4 fine-grained structured sparsity, the authors demonstrate up to a 2x speed-up on Nvidia A100 GPUs without a drop in accuracy.

To facilitate efficient training, the paper introduces the Sparse-refined Straight-through Estimator (SR-STE). This modification of the traditional STE compensates for the negative impact of approximate gradients on network training dynamics by using a sparse-refined term. This term helps ensure stability in the learned sparse network's topology, as quantified by the newly introduced Sparse Architecture Divergence (SAD) metric.

Strong Numerical Results

The authors conduct comprehensive experiments across various computer vision tasks (such as image classification, detection, and segmentation) and machine translation, demonstrating the efficacy of the proposed method. Notably, the sparse networks trained with their approach achieve comparable or even superior performance to their dense counterparts, depending on the granularity of sparsity (e.g., 2:4, 4:8).

Implications and Future Directions

The implications of this work are multi-faceted, impacting both theoretical and practical facets of neural network sparsity. Theoretically, it contributes to understanding the relationship between dynamic pruning, gradient estimation, and network performance. Practically, it presents a robust framework for deploying efficient DNNs on modern hardware, paving the way for more accessible deep learning applications in resource-limited environments.

As future directions, it would be valuable to explore the extension of N:M fine-grained sparsity beyond commonly used operations to others such as attention mechanisms in transformers. Furthermore, mixed or adaptive N:M sparsity could be investigated to optimize performance across varied application domains.

In summary, this work provides an important stepping stone towards efficient neural network deployment, offering insights and methodologies that are adaptable to the evolving landscape of hardware and software co-design in deep learning.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Aojun Zhou (45 papers)
Yukun Ma (33 papers)
Junnan Zhu (13 papers)
Jianbo Liu (22 papers)
Zhijie Zhang (25 papers)
Kun Yuan (117 papers)
Wenxiu Sun (59 papers)
Hongsheng Li (340 papers)

Citations (202)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - aojunzz/NM-sparsity (212 stars)