Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tensor network compressibility of convolutional models (2403.14379v2)

Published 21 Mar 2024 in cs.CV, cs.LG, and quant-ph

Abstract: Convolutional neural networks (CNNs) are one of the most widely used neural network architectures, showcasing state-of-the-art performance in computer vision tasks. Although larger CNNs generally exhibit higher accuracy, their size can be effectively reduced by tensorization'' while maintaining accuracy, namely, replacing the convolution kernels with compact decompositions such as Tucker, Canonical Polyadic decompositions, or quantum-inspired decompositions such as matrix product states, and directly training the factors in the decompositions to bias the learning towards low-rank decompositions. But why doesn't tensorization seem to impact the accuracy adversely? We explore this by assessing how \textit{truncating} the convolution kernels of \textit{dense} (untensorized) CNNs impact their accuracy. Specifically, we truncated the kernels of (i) a vanilla four-layer CNN and (ii) ResNet-50 pre-trained for image classification on CIFAR-10 and CIFAR-100 datasets. We found that kernels (especially those inside deeper layers) could often be truncated along several cuts resulting in significant loss in kernel norm but not in classification accuracy. This suggests that suchcorrelation compression'' (underlying tensorization) is an intrinsic feature of how information is encoded in dense CNNs. We also found that aggressively truncated models could often recover the pre-truncation accuracy after only a few epochs of re-training, suggesting that compressing the internal correlations of convolution layers does not often transport the model to a worse minimum. Our results can be applied to tensorize and compress CNN models more effectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. K. Simonyan and A. Zisserman,   (2014), arXiv:1409.1556 .
  2. F. Chollet,   (2016), arXiv:1610.02357 .
  3. V. O. Hinton, Geoffrey and J. Dean,  (2015), arXiv:1503.02531 .
  4. L. R. Tucker, Psychometrika 1966 31:3 31, 279 (1966).
  5. R. A. Harshman, UCLA Working Papers in Phonetics 16, 1 (1970).
  6. T. G. Kolda and B. W. Bader, SIAM Review 51, 455 (2009).
  7. R. R. Orús, Annals of Physics 349, 117 (2014), arXiv:1306.2164 .
  8. M. D. Zeiler and R. Fergus,   (2013), arXiv:1311.2901 .
  9. V. Dumoulin and F. Visin,   (2016), arXiv:1603.07285 .
  10. The softmax layer is not essential in shallow CNNs.
  11. MATLAB, https://www.mathworks.com/help/ images/ref/im2col.html .
  12. F. Dangel,  (2023), arXiv:2307.02275 .
  13. We use reshape to mean the operation implemented by the eponymous NumPy function.
  14. The compression ratio is the ratio of the size of the dense and tensorized models.
  15. C. Hillar and L.-H. Lim,   (2009), arXiv:0911.1393 .
  16. V. d. Silva and L.-H. Lim, SIAM Journal on Matrix Analysis and Applications 30 (2008).
  17. See Theorem 4.2 on https://www.tensors.net/tutorial-4.
  18. A graphical calculus for tensors was first popularized by Penrose. The current graphical calculus of quantum tensor networks can be formalized as string diagrams in a suitable category.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sukhbinder Singh (15 papers)
  2. Saeed S. Jahromi (28 papers)
  3. Roman Orus (77 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com