Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Power of Linear Combinations: Learning with Random Convolutions (2301.11360v2)

Published 26 Jan 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Following the traditional paradigm of convolutional neural networks (CNNs), modern CNNs manage to keep pace with more recent, for example transformer-based, models by not only increasing model depth and width but also the kernel size. This results in large amounts of learnable model parameters that need to be handled during training. While following the convolutional paradigm with the according spatial inductive bias, we question the significance of \emph{learned} convolution filters. In fact, our findings demonstrate that many contemporary CNN architectures can achieve high test accuracies without ever updating randomly initialized (spatial) convolution filters. Instead, simple linear combinations (implemented through efficient $1\times 1$ convolutions) suffice to effectively recombine even random filters into expressive network operators. Furthermore, these combinations of random filters can implicitly regularize the resulting operations, mitigating overfitting and enhancing overall performance and robustness. Conversely, retaining the ability to learn filter updates can impair network performance. Lastly, although we only observe relatively small gains from learning $3\times 3$ convolutions, the learning gains increase proportionally with kernel size, owing to the non-idealities of the independent and identically distributed (\textit{i.i.d.}) nature of default initialization techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Uuf2q9TfXGA.
  2. Fredsnet: Joint monocular depth and semantic segmentation with fast fourier convolutions, 2022.
  3. High-performance large-scale image recognition without normalization. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1059–1071. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/brock21a.html.
  4. Reversible column networks. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Oc2vlWU0jFY.
  5. Curve detectors. Distill, 2020. doi: 10.23915/distill.00024.003. URL https://distill.pub/2020/circuits/curve-detectors.
  6. Curve circuits. Distill, 2021. doi: 10.23915/distill.00024.006. URL https://distill.pub/2020/circuits/curve-circuits.
  7. Francois Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  9. Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  10. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13733–13742, June 2021.
  11. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11963–11975, June 2022.
  12. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7.
  13. Training batchnorm and only batchnorm: On the expressive power of random features in CNNs. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=vYeQQ29Tbvx.
  14. CNN Filter DB: An empirical investigation of trained convolutional filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19066–19076, June 2022a.
  15. Adversarial robustness through the lens of convolutional filters. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 139–147, June 2022b.
  16. Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URL https://proceedings.mlr.press/v9/glorot10a.html.
  17. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
  18. Explaining and harnessing adversarial examples. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6572.
  19. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015a.
  20. Deep residual learning for image recognition, 2015b.
  21. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  22. Sepp Hochreiter. Untersuchungen zu dynamischen neuronalen netzen [in german]. Technical report, 1991.
  23. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  24. Robert A. Hummel. Feature detection using basis functions. Computer Graphics and Image Processing, 9(1):40–55, 1979. ISSN 0146-664X. doi: https://doi.org/10.1016/0146-664X(79)90081-9. URL https://www.sciencedirect.com/science/article/pii/0146664X79900819.
  25. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 448–456, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/ioffe15.html.
  26. Speeding up convolutional neural networks with low rank expansions. In Proceedings of the British Machine Vision Conference. BMVA Press, 2014. doi: http://dx.doi.org/10.5244/C.28.88.
  27. Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies, pages 237–243. 2001. doi: 10.1109/9780470544037.ch14.
  28. Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
  29. Network in network. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014. URL http://arxiv.org/abs/1312.4400.
  30. Multi-level wavelet convolutional neural networks. IEEE Access, 7:74973–74985, 2019. doi: 10.1109/ACCESS.2019.2921451.
  31. More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=bXNl-myZkJl.
  32. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976–11986, June 2022.
  33. SGDR: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Skq89Scxx.
  34. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
  35. Mixed precision training. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=r1gs9JgRZ.
  36. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011. URL http://ufldl.stanford.edu/housenumbers.
  37. Zoom in: An introduction to circuits. Distill, 5, 2020a. doi: 10.23915/distill.00024.001. URL https://distill.pub/2020/circuits/zoom-in.
  38. An overview of early vision in inceptionv1. Distill, 2020b. doi: 10.23915/distill.00024.002. URL https://distill.pub/2020/circuits/early-vision.
  39. Naturally occurring equivariance in neural networks. Distill, 2020c. doi: 10.23915/distill.00024.004. URL https://distill.pub/2020/circuits/equivariance.
  40. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
  41. Weight banding. Distill, 2021. doi: 10.23915/distill.00024.009. URL https://distill.pub/2020/circuits/weight-banding.
  42. DCFNet: Deep neural network with decomposed convolutional filters. International Conference on Machine Learning, 2018.
  43. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL https://proceedings.neurips.cc/paper/2007/file/013a006f03dbc5392effeb8f18fda755-Paper.pdf.
  44. What’s hidden in a randomly weighted neural network? 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11890–11899, 2019.
  45. Frank Rosenblatt. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65 6:386–408, 1958.
  46. Generalization properties of learning with random features. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/61b1fb3f59e28c67f3925f3c79be81a1-Paper.pdf.
  47. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  48. High-low frequency detectors. Distill, 2021. doi: 10.23915/distill.00024.005. URL https://distill.pub/2020/circuits/frequency-edges.
  49. Very deep convolutional networks for large-scale image recognition, 2015.
  50. Going deeper with convolutions, 2014a.
  51. Intriguing properties of neural networks, 2014b.
  52. Efficientnet: Rethinking model scaling for convolutional neural networks, 2020.
  53. Basisconv: A method for compressed representation and learning in cnns. CoRR, abs/1906.04509, 2019.
  54. Patches are all you need? Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=rAnB7JSMXL. Featured Certification.
  55. Understanding the covariance structure of convolutional filters. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=WGApODQvwRg.
  56. Harmonic convolutional networks based on discrete cosine transform. Pattern Recognition, 129:108707, 2022. ISSN 0031-3203.
  57. Deep image prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  58. Fastvit: A fast hybrid vision transformer using structural reparameterization, 2023.
  59. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
  60. Roberto Antonio Vázquez and Juan Humberto Sossa Azuela. Random features applied to face recognition. Eighth Mexican International Conference on Current Trends in Computer Science (ENC 2007), pages 47–51, 2007.
  61. Visualizing weights. Distill, 2021a. doi: 10.23915/distill.00024.007. URL https://distill.pub/2020/circuits/visualizing-weights.
  62. Branch specialization. Distill, 2021b. doi: 10.23915/distill.00024.008. URL https://distill.pub/2020/circuits/branch-specialization.
  63. Internimage: Exploring large-scale vision foundation models with deformable convolutions. arXiv preprint arXiv:2211.05778, 2022.
  64. Resnet strikes back: An improved training procedure in timm. In NeurIPS 2021 Workshop on ImageNet: Past, Present, and Future, 2021. URL https://openreview.net/forum?id=NG6MJnVl6M5.
  65. Shift: A zero flop, zero parameter alternative to spatial convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  66. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
  67. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  68. How transferable are features in deep neural networks? In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 27 (NIPS ’14), pages 3320–3328. Curran Associates, Inc., 2014.
  69. Wide residual networks. In Edwin R. Hancock Richard C. Wilson and William A. P. Smith, editors, Proceedings of the British Machine Vision Conference (BMVC), pages 87.1–87.12. BMVA Press, September 2016. ISBN 1-901725-59-6. doi: 10.5244/C.30.87. URL https://dx.doi.org/10.5244/C.30.87.
  70. Are all layers created equal? Journal of Machine Learning Research, 23(67):1–28, 2022. URL http://jmlr.org/papers/v23/20-069.html.
  71. Deconstructing lottery tickets: Zeros, signs, and the supermask. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/1113d7a76ffceca1bb350bfe145467c6-Paper.pdf.
Citations (2)

Summary

We haven't generated a summary for this paper yet.