Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices (1909.05073v4)

Published 6 Sep 2019 in cs.LG, cs.CV, cs.DC, cs.NE, and stat.ML

Abstract: Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, -- fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2x, 11.4x, and 6.3x, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.

Analysis of PCONV: Advancements in DNN Weight Pruning for Mobile Devices

The paper by Xiaolong Ma et al. addresses the critical challenge of executing deep neural networks (DNNs) in real-time on mobile devices. The paper introduces a novel framework, PCONV, that leverages fine-grained pruning patterns within coarse-grained structures to enhance both the accuracy and computational efficiency of DNNs. This approach navigates between the traditional extremes of non-structured and structured weight pruning to optimize performance and accuracy simultaneously.

Key Innovations and Contributions

PCONV introduces two distinct types of sparsities: Sparse Convolution Patterns (SCP) and connectivity sparsity. SCP is derived from intra-convolution kernel pruning, which aims to preserve accuracy by utilizing specific vision properties embedded in certain convolution patterns. Connectivity sparsity stems from inter-convolution kernel pruning, increasing the pruning rate while ensuring balance in computational workload across filters. These innovations allow PCONV to transcend the limitations of existing pruning methods that either compromise hardware efficiency or suffer accuracy losses at high pruning rates.

To operationalize PCONV, the authors developed a compiler-assisted DNN inference framework capable of executing PCONV models in real-time without compromising accuracy—a feat unachievable in prior methodologies. Notably, the framework achieves impressive speed-ups, outperforming leading DNN frameworks such as TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network by factors of 39.2×39.2\times, 11.4×11.4\times, and 6.3×6.3\times, respectively.

Implications and Future Prospects

The implications of PCONV are profound in both theoretical and practical dimensions. Theoretically, it opens a promising intermediate ground in the sparsity spectrum, combining high accuracy and regularity—a space hitherto underexplored. Practically, PCONV propels DNN deployment on mobile devices, facilitating real-time inference for complex networks, which is critical for applications in wearable technology, smart health devices, and autonomous systems.

Future developments could explore further optimization of the compiler-assisted framework and extend the idea of pattern-based sparsity to other forms of neural networks beyond convolutional structures. Furthermore, integrating PCONV with emerging hardware architectures optimized for sparse matrix operations could amplify its impact, enhancing computational efficiency while minimizing energy consumption—a key consideration for mobile applications.

Conclusion

Overall, PCONV provides a compelling framework that marries the flexibility and precision of fine-grained pruning with the regularity and efficiency of structured methods. By advancing the frontier of DNN weight pruning, this paper unlocks further potential for machine learning applications on mobile platforms, setting the stage for continued innovation in resource-efficient AI.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xiaolong Ma (57 papers)
  2. Fu-Ming Guo (7 papers)
  3. Wei Niu (68 papers)
  4. Xue Lin (92 papers)
  5. Jian Tang (326 papers)
  6. Kaisheng Ma (46 papers)
  7. Bin Ren (136 papers)
  8. Yanzhi Wang (197 papers)
Citations (160)