Structured Pruning of Deep Convolutional Neural Networks
Overview
The paper "Structured Pruning of Deep Convolutional Neural Networks" by Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung addresses the challenge of optimizing deep learning models for real-time applications, particularly on resource-constrained devices. The authors introduce a structured pruning methodology for Convolutional Neural Networks (CNNs) that incorporates channel-wise, kernel-wise, and intra-kernel strided sparsity. Their approach leverages a particle filtering technique to identify the most critical network connections, which are subsequently pruned and then retrained to compensate for lost performance. Furthermore, the pruned network is quantized, significantly reducing its storage size and enhancing suitability for embedded systems.
Methodology
The primary contribution of the paper is the introduction of structured sparsity at different granularities—channel, kernel, and intra-kernel levels. The method involves:
- Pruning Granularities:
- Channel-Level Pruning: Eliminates entire channels (feature maps), thereby reducing the network's dimensions directly.
- Kernel-Level Pruning: Removes entire kĂ—k kernels from the network.
- Intra-Kernel Strided Pruning: Introduces zero-valued weights at specific, well-defined locations within kernels to enforce sparsity patterns that are computationally advantageous.
- Particle Filtering Approach:
- Selection of Pruning Candidates: Utilizes a particle filtering method to simulate different pruning patterns and evaluates their impact on network performance. The misclassification rate (MCR) is used to assign importance weights to different particles, guiding the pruning process.
- Evolutionary Particle Filter (EPF): Enhances the particle filter by integrating elements of genetic algorithms, maintaining diversity among pruning candidates and improving the overall efficacy of the pruning process.
- Retraining and Fixed-Point Optimization:
- After pruning, the network is retrained to mitigate any performance degradation. Fixed-point optimization reduces the memory requirements by quantizing the network weights to lower precision while preserving performance.
Experimental Results
The authors conducted experiments on MNIST and CIFAR-10 datasets to validate their pruning approach. Key findings include:
- Performance Retention: The pruned networks maintained comparable performance to the baseline while achieving significant parameter reductions. For instance, a network pruned to 1-20-20-20-20-500-10 configuration on MNIST retained its performance with only a marginal increase in MCR.
- Channel and Kernel Pruning: Significantly reduced the number of convolution connections while maintaining performance.
- Intra-Kernel Strided Pruning: Showed potential to further reduce computational complexity when combined with convolution lowering techniques.
Implications
Practically, this structured pruning strategy is advantageous for deploying deep learning models on embedded systems and parallel computing environments. The reduced model size not only facilitates on-chip memory storage but also minimizes energy consumption due to fewer DRAM accesses. Theoretically, the structured pruning framework provides a robust approach to balance model complexity and efficiency.
Future Directions
Future research could explore:
- Extended Profiling: Detailed examination of execution time benefits conferred by reduced convolution layer complexity.
- Network Parameter Space Exploration: Application of SMC techniques for thorough exploration of network configuration spaces.
- Advanced Quantization Techniques: Further optimization methods to complement the pruning strategies for even more efficient deep learning models.
In summary, the structured pruning methodology presented in this paper offers a practical and theoretically sound approach to optimize the deployment of deep learning models in resource-constrained environments. The innovative use of particle filters and structured sparsity makes this a valuable contribution to the field of neural network optimization.