Speeding up Resnet Architecture with Layers Targeted Low Rank Decomposition (2309.12412v1)
Abstract: Compression of a neural network can help in speeding up both the training and the inference of the network. In this research, we study applying compression using low rank decomposition on network layers. Our research demonstrates that to acquire a speed up, the compression methodology should be aware of the underlying hardware as analysis should be done to choose which layers to compress. The advantage of our approach is demonstrated via a case study of compressing ResNet50 and training on full ImageNet-ILSVRC2012. We tested on two different hardware systems Nvidia V100 and Huawei Ascend910. With hardware targeted compression, results on Ascend910 showed 5.36% training speedup and 15.79% inference speed on Ascend310 with only 1% drop in accuracy compared to the original uncompressed model
- Rank selection of cp-decomposed convolutional layers with variational bayesian matrix factorization, 2018. URL https://arxiv.org/abs/1801.05243.
- Improving resnet-9 generalization trained on small datasets, 2023.
- A simplified fully quantized transformer for end-to-end speech recognition. arXiv preprint arXiv:1911.03604, 2019.
- Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3435–3444, 2019.
- A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282, 2017.
- On the best rank-1 and rank-(r 1, r 2,…, rn) approximation of higher-order tensors. SIAM journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000a.
- A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000b.
- Large scale distributed deep networks. 2012.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
- Extended kalman filter frequency tracker for nonstationary harmonic signals. Measurement, 45(1):126–132, 2012.
- Robust estimation and tracking of pitch period using an efficient bayesian filter. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(7):1219–1229, 2016.
- Inflection point analysis: A machine learning approach for extraction of iegm active intervals during atrial fibrillation. Artificial intelligence in medicine, 85:7–15, 2018a.
- Ecg delineation for qt interval analysis using an unsupervised learning method. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2541–2545. IEEE, 2018b.
- A deep learning approach for diagnosing long qt syndrome without measuring qt interval. In Advances in Artificial Intelligence: 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019, Kingston, ON, Canada, May 28–31, 2019, Proceedings 32, pp. 440–445. Springer, 2019.
- Compressing pre-trained language models using progressive low rank decomposition. 2021.
- Strategies for applying low rank decomposition to transformer-based models. In 36th Conference on Neural Information Processing Systems (NeurIPS2022), 2022a.
- Long qt syndrome diagnosis and classification, May 31 2022b. US Patent 11,344,246.
- Training acceleration of low-rank decomposed networks using sequential freezing and rank quantization. arXiv preprint arXiv:2309.03824, 2023a.
- Methods, systems, and media for computer vision using 2d convolution of 4d video data tensors, April 20 2023b. US Patent App. 17/502,588.
- Deep residual learning for image recognition, 2015. URL https://arxiv.org/abs/1512.03385.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Kholiavchenko, M. Iterative low-rank approximation for CNN compression. CoRR, abs/1803.08995, 2018. URL http://arxiv.org/abs/1803.08995.
- A short study on compressing decoder-based language models. arXiv preprint arXiv:2110.08460, 2021.
- Thinet: pruning cnn filters for a thinner net. IEEE transactions on pattern analysis and machine intelligence, 41(10):2525–2538, 2018.
- Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arXiv:1705.08922, 2017.
- Fully quantized transformer for machine translation. arXiv preprint arXiv:1910.10485, 2019.
- Mate-kd: Masked adversarial text, a companion to knowledge distillation. arXiv preprint arXiv:2105.05912, 2021.
- Van Loan, C. Matrix computations and signal processing. Technical report, Cornell University, 1987.
- A systematic dnn weight pruning framework using alternating direction method of multipliers. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 184–199, 2018.
- Discrimination-aware channel pruning for deep neural networks. Advances in neural information processing systems, 31, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.