Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks (2401.10341v1)

Published 18 Jan 2024 in cs.CV and cs.AI

Abstract: Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Can we gain more from orthogonality regularizations in training deep networks? Advances in Neural Information Processing Systems, 31, 2018.
  2. Automatic differentiation in machine learning: a survey. Journal of Marchine Learning Research, 18:1–43, 2018.
  3. Deep rewiring: Training very sparse deep networks. In International Conference on Learning Representations, 2018.
  4. Decoding by linear programming. IEEE transactions on information theory, 51(12):4203–4215, 2005.
  5. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems, pp. 1269–1277, 2014.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  7. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936.
  8. Rigging the lottery: Making all tickets winners. In International Conference on Machine Learning, pp. 2943–2952. PMLR, 2020.
  9. Interpretations steered network pruning via amortized inferred saliency maps. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
  10. Low-rank training of deep neural networks for emerging memory technology. arXiv preprint arXiv:2009.03887, 2020.
  11. Towards compact neural networks via end-to-end training: A bayesian tensor approach with automatic rank determination. SIAM Journal on Mathematics of Data Science, 4(1):46–71, 2022.
  12. Exploring unexplored tensor network decompositions for convolutional neural networks. Advances in Neural Information Processing Systems, 32, 2019.
  13. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp.  2234–2240, 2018.
  14. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  4340–4349, 2019.
  15. Frank L Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1-4):164–189, 1927.
  16. Orthogonal weight normalization: Solution to optimization over multiple dependent stiefel manifolds in deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  17. Low-rank compression of neural nets: Learning the rank of each layer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8049–8059, 2020.
  18. Training cnns with low-rank filters for efficient image classification. arXiv preprint arXiv:1511.06744, 2015.
  19. Initialization and regularization of factorized neural layers. In International Conference on Learning Representations, 2020.
  20. Efficient neural network compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  12569–12577, 2019.
  21. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530, 2015.
  22. T-net: Parametrizing fully convolutional nets with a single high-order tensor. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7822–7831, 2019.
  23. Factorized higher-order cnns with an application to spatio-temporal emotion estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6060–6069, 2020.
  24. Snip: Single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations, 2018.
  25. Chong Li and CJ Shi. Constrained optimization based low-rank approximation of deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  732–747, 2018.
  26. Heuristic rank selection with progressively searching tensor ring network. Complex & Intelligent Systems, pp.  1–15, 2021a.
  27. Group sparsity: The hinge between filter pruning and decomposition for network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8018–8027, 2020.
  28. Towards compact cnns via collaborative compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6438–6447, 2021b.
  29. Compressing neural networks: Towards determining the optimal layer-wise decomposition. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=BvJkwMhyInm.
  30. Network in network. arXiv preprint arXiv:1312.4400, 2013.
  31. Hrank: Filter pruning using high-rank feature map. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1529–1538, 2020a.
  32. Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2790–2799, 2019.
  33. Dynamic model pruning with feedback. arXiv preprint arXiv:2006.07253, 2020b.
  34. Tensor completion for estimating missing values in visual data. IEEE transactions on pattern analysis and machine intelligence, 35(1):208–220, 2012.
  35. Dynamic sparse training: Find efficient sparse network from scratch with trainable masked layers. arXiv preprint arXiv:2005.06870, 2020.
  36. A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11976–11986, 2022.
  37. Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognition, 107:107461, 2020.
  38. Prunetrain: fast neural network training by dynamic sparse model reconfiguration. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp.  1–13, 2019.
  39. Cascaded projection: End-to-end network compression and acceleration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10715–10724, 2019.
  40. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.
  41. Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nature communications, 9(1):1–12, 2018.
  42. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In International Conference on Machine Learning, pp. 4646–4655. PMLR, 2019.
  43. Dsa: More efficient budgeted pruning via differentiable sparsity allocation. In The European Conference on Computer Vision (ECCV), 2020.
  44. Tensorizing neural networks. In Advances in Neural Information Processing Systems, pp. 442–450, 2015.
  45. Stable low-rank tensor decomposition for compression of convolutional neural network. In European Conference on Computer Vision, pp.  522–539. Springer, 2020.
  46. Regularizing cnns with locally constrained decorrelations. In International Conference on Learning Representations (ICLR), 2017.
  47. Chip: Channel independence-based pruning for compact neural networks. In Advances in Neural Information Processing Systems, 2021.
  48. Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067, 2015.
  49. Scop: Scientific control for reliable neural network pruning. In Advances in Neural Information Processing Systems, pp. 10936–10947, 2020. URL https://proceedings.neurips.cc/paper/2020/file/7bcdf75ad237b8e02e301f4091fb6bc8-Paper.pdf.
  50. Ledyard R Tucker. Some mathematical notes on three-mode factor analysis. Psychometrika, 31(3):279–311, 1966.
  51. Principal component networks: Parameter reduction early in training. arXiv preprint arXiv:2006.13347, 2020.
  52. Picking winning tickets before training by preserving gradient flow. In International Conference on Learning Representations, 2019.
  53. Pufferfish: communication-efficient models at no extra cost. Proceedings of Machine Learning and Systems, 3:365–386, 2021.
  54. Orthogonal convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  11505–11515, 2020.
  55. Wide compression: Tensor ring nets. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  9329–9338, 2018.
  56. All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.  6176–6185, 2017.
  57. Trp: Trained rank pruning for efficient deep neural networks. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pp.  977–983, 2020.
  58. Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp.  678–679, 2020.
  59. Tensor-train recurrent neural networks for video classification. In International Conference on Machine Learning, pp. 3891–3900, 2017.
  60. Towards efficient tensor decomposition-based dnn model compression with optimization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10674–10683, 2021.
  61. Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
  62. Growing efficient deep networks by structured continuous sparsification. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=wb3wxCObbRT.
  63. Efficient neural network training via forward and backward propagation sparsification. Advances in Neural Information Processing Systems 34, 2021.
  64. Discrimination-aware channel pruning for deep neural networks. In Advances in Neural Information Processing Systems, pp. 875–886, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yang Sui (30 papers)
  2. Miao Yin (25 papers)
  3. Yu Gong (46 papers)
  4. Jinqi Xiao (8 papers)
  5. Huy Phan (75 papers)
  6. Bo Yuan (151 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets