Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Soft Threshold Weight Reparameterization for Learnable Sparsity (2002.03231v9)

Published 8 Feb 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Sparsity in Deep Neural Networks (DNNs) is studied extensively with the focus of maximizing prediction accuracy given an overall parameter budget. Existing methods rely on uniform or heuristic non-uniform sparsity budgets which have sub-optimal layer-wise parameter allocation resulting in a) lower prediction accuracy or b) higher inference cost (FLOPs). This work proposes Soft Threshold Reparameterization (STR), a novel use of the soft-threshold operator on DNN weights. STR smoothly induces sparsity while learning pruning thresholds thereby obtaining a non-uniform sparsity budget. Our method achieves state-of-the-art accuracy for unstructured sparsity in CNNs (ResNet50 and MobileNetV1 on ImageNet-1K), and, additionally, learns non-uniform budgets that empirically reduce the FLOPs by up to 50%. Notably, STR boosts the accuracy over existing results by up to 10% in the ultra sparse (99%) regime and can also be used to induce low-rank (structured sparsity) in RNNs. In short, STR is a simple mechanism which learns effective sparsity budgets that contrast with popular heuristics. Code, pretrained models and sparsity budgets are at https://github.com/RAIVNLab/STR.

Citations (223)

Summary

  • The paper introduces STR, a novel approach that uses a differentiable soft-threshold operator to learn optimal, layer-specific sparsity in deep neural networks.
  • The method achieves state-of-the-art performance on CNNs such as ResNet50, improving ultra-sparse network accuracy by up to 10% and reducing FLOPs by as much as 50%.
  • Its extension to structured sparsity in RNNs demonstrates broad applicability, paving the way for more efficient and resource-aware deep learning models.

Soft Threshold Weight Reparameterization for Learnable Sparsity

The paper "Soft Threshold Weight Reparameterization for Learnable Sparsity" presents a novel approach to achieving sparsity in deep neural networks (DNNs) that optimizes layer-wise parameter allocation in a non-uniform manner. The central contribution is the introduction of Soft Threshold Reparameterization (STR), a technique that extends the classical soft-thresholding method to enable learnable sparsity within the training process. This approach meticulously induces sparsity across different layers by learning pruning thresholds directly through backpropagation.

Key Contributions

  1. Soft Threshold Reparameterization (STR): The paper proposes STR as a mechanism to directly optimize projected weights onto sparse sets in neural networks. The innovation lies in reparameterizing the weight optimization problem with the soft-threshold operator, which not only facilitates learning sparsity in a differentiable manner but also naturally adapts to the unique structure and needs of each layer within a network.
  2. Layer-Specific Sparsity: STR allows for the learning of layer-specific sparsity thresholds rather than relying on global or heuristic threshold values. This capability empowers the neural network to automatically discover an optimal distribution of sparsity across layers, which is particularly beneficial for complex architectures like CNNs and RNNs.
  3. Empirical Results: The authors demonstrate that STR achieves state-of-the-art accuracy for unstructured sparsity on CNNs, such as ResNet50 and MobileNetV1, on large-scale datasets like ImageNet-1K. Notably, the method reduces inference costs by significantly lowering FLOPs. For example, STR enhances the accuracy of ultra-sparse networks by up to 10% over competing methods and can reduce FLOPs by up to 50% in some cases.
  4. Extension to Structured Sparsity: The paper also illustrates the adaptability of STR to structured sparsity by applying it to induce low-rank structures in RNNs. This demonstrates the generalizability of the approach across different types of neural network architectures.

Implications and Future Directions

The introduction of learnable sparsity thresholds within DNNs promises several practical implications. For one, it paves the way for more resource-efficient neural networks capable of operating effectively under stringent computational budgets, making them suitable for edge devices with limited resources. The ability to learn optimal sparsity distributions could further enhance the deployment of deep learning models on mobile devices and IoT applications, where efficient use of memory and processing power is crucial.

In a theoretical context, the work challenges traditional uniform and heuristic-based sparsity allocation schemes by demonstrating the advantages of a data-driven approach in discovering the most effective sparsity patterns. This underscores a shift towards more adaptable and intelligent sparsification techniques in the field of deep learning.

Future research might explore various extensions of STR, such as experimenting with different functional forms for the threshold function g(s)g(s) or integrating STR with other model compression techniques like quantization and knowledge distillation. Moreover, evaluating the transferability of learned sparsity patterns across different tasks could provide further insights into the role of sparsity in knowledge representation and transfer learning.

Overall, STR represents a significant step towards more efficient and effective utilization of deep network architectures, promising impacts on both theoretical understanding and practical applications of machine learning.