Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Subspace Node Pruning (2405.17506v2)

Published 26 May 2024 in cs.LG, cs.CV, and cs.NE

Abstract: Efficiency of neural network inference is undeniably important in a time where commercial use of AI models increases daily. Node pruning is the art of removing computational units such as neurons, filters, attention heads, or even entire layers to significantly reduce inference time while retaining network performance. In this work, we propose the projection of unit activations to an orthogonal subspace in which there is no redundant activity and within which we may prune nodes while simultaneously recovering the impact of lost units via linear least squares. We identify that, for effective node pruning, this subspace must be constructed using a triangular transformation matrix, a transformation which is equivalent to and unnormalized Gram-Schmidt orthogonalization. We furthermore show that the order in which units are orthogonalized can be optimised to maximally reduce node activations in our subspace and thereby form a more optimal ranking of nodes. Finally, we leverage these orthogonal subspaces to automatically determine layer-wise pruning ratios based upon the relative scale of node activations in our subspace, equivalent to cumulative variance. Our proposed method reaches state of the art when pruning ImageNet trained VGG-16 and rivals more complex state of the art methods when pruning ResNet-50 networks across a range of pruning ratios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.
  2. Learning the number of neurons in deep networks. Advances in Neural Information Processing Systems, 29.
  3. Redundant feature pruning for accelerated inference in deep neural networks. Neural Networks, 118:148–158.
  4. JAX: composable transformations of Python+NumPy programs.
  5. A proximal algorithm for network slimming. arXiv preprint arXiv:2307.00684.
  6. Layer-compensated pruning for resource-constrained convolutional neural networks. arXiv preprint arXiv:1810.00518.
  7. Nvidia a100 tensor core gpu: Performance and innovation. IEEE Micro, 41(2):29–35.
  8. Filter distillation for network compression. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3140–3149.
  9. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635.
  10. Vacl: Variance-aware cross-layer regularization for pruning deep residual networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0.
  11. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, pages 291–326. Chapman and Hall/CRC.
  12. Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819.
  13. Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems, 28.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778.
  15. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 1389–1397.
  16. Data-driven sparse structure selection for deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 304–320.
  17. Motivation for and evaluation of the first tensor processing unit. IEEE Micro, 38(3):10–19.
  18. Krishnamoorthi, R. (2018). Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342.
  19. Learning multiple layers of features from tiny images.
  20. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.
  21. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270.
  22. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision, pages 5058–5066.
  23. Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11264–11272.
  24. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.
  25. Proximal algorithms. Foundations and Trends in Optimization, 1(3):127–239.
  26. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32.
  27. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211–252.
  28. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  29. Learning structured sparsity in deep neural networks. Advances in Neural Information Processing Systems, 29.
  30. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint arXiv:1802.00124.
  31. Fchp: Exploring the discriminative feature and feature correlation of feature maps for hierarchical dnn pruning and compression. IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6807–6820.
  32. Advancing model pruning via bi-level optimization. Advances in Neural Information Processing Systems, 35:18309–18326.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com