ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks (2401.10341v1)
Abstract: Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT.
- Can we gain more from orthogonality regularizations in training deep networks? Advances in Neural Information Processing Systems, 31, 2018.
- Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18:1–43, 2018.
- Deep rewiring: Training very sparse deep networks. In International Conference on Learning Representations, 2018.
- Decoding by linear programming. IEEE transactions on information theory, 51(12):4203–4215, 2005.
- Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems, pp. 1269–1277, 2014.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936.
- Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pp. 2943–2952. PMLR, 2020.
- Interpretations steered network pruning via amortized inferred saliency maps. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
- Low-rank training of deep neural networks for emerging memory technology. arXiv preprint arXiv:2009.03887, 2020.
- Towards compact neural networks via end-to-end training: A bayesian tensor approach with automatic rank determination. SIAM Journal on Mathematics of Data Science, 4(1):46–71, 2022.
- Exploring unexplored tensor network decompositions for convolutional neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 2234–2240, 2018.
- Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4340–4349, 2019.
- Frank L Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1-4):164–189, 1927.
- Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8049–8059, 2020.
- Training cnns with low-rank filters for efficient image classification. arXiv preprint arXiv:1511.06744, 2015.
- Initialization and regularization of factorized neural layers. In International Conference on Learning Representations, 2020.
- Efficient neural network compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12569–12577, 2019.
- Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530, 2015.
- T-net: Parametrizing fully convolutional nets with a single high-order tensor. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7822–7831, 2019.
- Factorized higher-order cnns with an application to spatio-temporal emotion estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6060–6069, 2020.
- Snip: Single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations, 2018.
- Chong Li and CJ Shi. Constrained optimization based low-rank approximation of deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 732–747, 2018.
- Heuristic rank selection with progressively searching tensor ring network. Complex & Intelligent Systems, pp. 1–15, 2021a.
- Group sparsity: The hinge between filter pruning and decomposition for network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8018–8027, 2020.
- Towards compact cnns via collaborative compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6438–6447, 2021b.
- Compressing neural networks: Towards determining the optimal layer-wise decomposition. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=BvJkwMhyInm.
- Network in network. arXiv preprint arXiv:1312.4400, 2013.
- Hrank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1529–1538, 2020a.
- Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2790–2799, 2019.
- Dynamic model pruning with feedback. arXiv preprint arXiv:2006.07253, 2020b.
- Tensor completion for estimating missing values in visual data. IEEE transactions on pattern analysis and machine intelligence, 35(1):208–220, 2012.
- Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers. arXiv preprint arXiv:2005.06870, 2020.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022.
- Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognition, 107:107461, 2020.
- Prunetrain: fast neural network training by dynamic sparse model reconfiguration. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–13, 2019.
- Cascaded projection: End-to-end network compression and acceleration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10715–10724, 2019.
- Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
- Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature communications, 9(1):1–12, 2018.
- Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In International Conference on Machine Learning, pp. 4646–4655. PMLR, 2019.
- Dsa: More efficient budgeted pruning via differentiable sparsity allocation. In The European Conference on Computer Vision (ECCV), 2020.
- Tensorizing neural networks. In Advances in Neural Information Processing Systems, pp. 442–450, 2015.
- Stable low-rank tensor decomposition for compression of convolutional neural network. In European Conference on Computer Vision, pp. 522–539. Springer, 2020.
- Regularizing cnns with locally constrained decorrelations. In International Conference on Learning Representations (ICLR), 2017.
- Chip: Channel independence-based pruning for compact neural networks. In Advances in Neural Information Processing Systems, 2021.
- Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067, 2015.
- Scop: Scientific control for reliable neural network pruning. In Advances in Neural Information Processing Systems, pp. 10936–10947, 2020. URL https://proceedings.neurips.cc/paper/2020/file/7bcdf75ad237b8e02e301f4091fb6bc8-Paper.pdf.
- Ledyard R Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279–311, 1966.
- Principal component networks: Parameter reduction early in training. arXiv preprint arXiv:2006.13347, 2020.
- Picking winning tickets before training by preserving gradient flow. In International Conference on Learning Representations, 2019.
- Pufferfish: communication-efficient models at no extra cost. Proceedings of Machine Learning and Systems, 3:365–386, 2021.
- Orthogonal convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11505–11515, 2020.
- Wide compression: Tensor ring nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9329–9338, 2018.
- All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185, 2017.
- Trp: Trained rank pruning for efficient deep neural networks. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp. 977–983, 2020.
- Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 678–679, 2020.
- Tensor-train recurrent neural networks for video classification. In International Conference on Machine Learning, pp. 3891–3900, 2017.
- Towards efficient tensor decomposition-based dnn model compression with optimization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10674–10683, 2021.
- Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
- Growing efficient deep networks by structured continuous sparsification. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=wb3wxCObbRT.
- Efficient neural network training via forward and backward propagation sparsification. Advances in Neural Information Processing Systems 34, 2021.
- Discrimination-aware channel pruning for deep neural networks. In Advances in Neural Information Processing Systems, pp. 875–886, 2018.