Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Random Search as a Baseline for Sparse Neural Network Architecture Search (2403.08265v2)

Published 13 Mar 2024 in cs.LG, cs.AI, and cs.NE

Abstract: Sparse neural networks have shown similar or better generalization performance than their dense counterparts while having higher parameter efficiency. This has motivated a number of works to learn or search for high performing sparse networks. While reports of task performance or efficiency gains are impressive, standard baselines are lacking leading to poor comparability and unreliable reproducibility across methods. In this work, we propose Random Search as a baseline algorithm for finding good sparse configurations and study its performance. We apply Random Search on the node space of an overparameterized network with the goal of finding better initialized sparse sub-networks that are positioned more advantageously in the loss landscape. We record the post-training performances of the found sparse networks and at various levels of sparsity, and compare against both their fully connected parent networks and random sparse configurations at the same sparsity levels. First, we demonstrate performance at different levels of sparsity and highlight that a significant level of performance can still be preserved even when the network is highly sparse. Second, we observe that for this sparse architecture search task, initialized sparse networks found by Random Search neither perform better nor converge more efficiently than their random counterparts. Thus we conclude that Random Search may be viewed as a reasonable neutral baseline for sparsity search methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning (still) requires rethinking generalization,” Commun. ACM, vol. 64, p. 107–115, feb 2021.
  2. B. Neyshabur, Z. Li, S. Bhojanapalli, Y. LeCun, and N. Srebro, “The role of over-parametrization in generalization of neural networks,” in International Conference on Learning Representations, 2019.
  3. T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste, “Sparsity in deep learning: pruning and growth for efficient inference and training in neural networks,” J. Mach. Learn. Res., vol. 22, jan 2021.
  4. S. Srinivas, A. Subramanya, and R. V. Babu, “Training sparse neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 455–462, 2017.
  5. Z. Zhang, H. Wang, S. Han, and W. J. Dally, “Sparch: Efficient architecture for sparse matrix multiplication,” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), (Los Alamitos, CA, USA), pp. 261–274, IEEE Computer Society, feb 2020.
  6. T. Dettmers and L. Zettlemoyer, “Sparse networks from scratch: Faster training without losing performance,” CoRR, vol. abs/1907.04840, 2019.
  7. E. Elsen, M. Dukhan, T. Gale, and K. Simonyan, “Fast sparse convnets,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (Los Alamitos, CA, USA), pp. 14617–14626, IEEE Computer Society, jun 2020.
  8. B. Graham, “Spatially-sparse convolutional neural networks,” CoRR, vol. abs/1409.6070, 2014.
  9. Y. Le Cun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in Proceedings of the 2nd International Conference on Neural Information Processing Systems, NIPS’89, (Cambridge, MA, USA), p. 598–605, MIT Press, 1989.
  10. D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the state of neural network pruning?,” Proceedings of machine learning and systems, vol. 2, pp. 129–146, 2020.
  11. B. Hassibi, D. G. Stork, and G. J. Wolff, “Optimal brain surgeon and general network pruning,” in IEEE international conference on neural networks, pp. 293–299, IEEE, 1993.
  12. S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” Advances in neural information processing systems, vol. 28, 2015.
  13. J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural networks. arxiv 2018,” arXiv preprint arXiv:1803.03635, 1810.
  14. Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, “Rethinking the value of network pruning,” in International Conference on Learning Representations, 2019.
  15. H. Zhou, J. Lan, R. Liu, and J. Yosinski, “Deconstructing lottery tickets: Zeros, signs, and the supermask,” Advances in neural information processing systems, vol. 32, 2019.
  16. A. Gaier and D. Ha, “Weight agnostic neural networks,” Advances in neural information processing systems, vol. 32, 2019.
  17. V. Ramanujan, M. Wortsman, A. Kembhavi, A. Farhadi, and M. Rastegari, “What’s hidden in a randomly weighted neural network? in 2020 ieee,” in CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11890–11899, 2019.
  18. M. Gupta, S. Aravindan, A. Kalisz, V. Chandrasekhar, and J. Lin, “Learning to prune deep neural networks via reinforcement learning,” CoRR, vol. abs/2007.04756, 2020.
  19. J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Pruning neural networks at initialization: Why are we missing the mark?,” in International Conference on Learning Representations, 2021.
  20. S. Liu, T. Chen, Z. Zhang, X. Chen, T. Huang, A. K. JAISWAL, and Z. Wang, “Sparsity may cry: Let us fail (current) sparse neural networks together!,” in The Eleventh International Conference on Learning Representations, 2023.
  21. World Scientific, 2011.
  22. P. Chrabaszcz, I. Loshchilov, and F. Hutter, “Back to basics: benchmarking canonical evolution strategies for playing atari,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, p. 1419–1426, AAAI Press, 2018.
  23. T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” Journal of Machine Learning Research, vol. 20, no. 55, pp. 1–21, 2019.
  24. H. Liu, K. Simonyan, O. Vinyals, C. Fernando, and K. Kavukcuoglu, “Hierarchical representations for efficient architecture search,” in International Conference on Learning Representations, 2018.
  25. E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in International conference on machine learning, pp. 2902–2911, PMLR, 2017.
  26. A. Achille, M. Rovere, and S. Soatto, “Critical learning periods in deep networks,” in International Conference on Learning Representations, 2019.
  27. G. Gur-Ari, D. A. Roberts, and E. Dyer, “Gradient descent happens in a tiny subspace,” arXiv preprint arXiv:1812.04754, 2018.
  28. A. S. Golatkar, A. Achille, and S. Soatto, “Time matters in regularizing deep networks: Weight decay and data augmentation affect early learning dynamics, matter little near convergence,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  29. J. Frankle, D. J. Schwab, and A. S. Morcos, “The early phase of neural network training,” in International Conference on Learning Representations, 2020.
  30. D. Mishkin and J. Matas, “All you need is a good init,” arXiv preprint arXiv:1511.06422, 2015.
  31. Y. N. Dauphin and S. Schoenholz, “Metainit: Initializing learning by learning to initialize,” in Advances in Neural Information Processing Systems (H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, and R. Garnett, eds.), vol. 32, Curran Associates, Inc., 2019.
  32. K. A. De Jong, Evolutionary Computation. MIT Press, 2016.
  33. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient convnets,” in International Conference on Learning Representations, 2017.
  34. S. Xie, A. Kirillov, R. Girshick, and K. He, “Exploring randomly wired neural networks for image recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1284–1293, 2019.
  35. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations (ICLR 2015), pp. 1–14, Computational and Biological Learning Society, 2015.
  36. S. Zagoruyko and N. Komodakis, “Wide residual networks,” in BMVC, 2016.
  37. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
  38. R. Giryes, G. Sapiro, and A. M. Bronstein, “Deep neural networks with random gaussian weights: A universal classification strategy?,” IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3444–3457, 2016.
  39. L. Wan, M. Zeiler, S. Zhang, Y. Le Cun, and R. Fergus, “Regularization of neural networks using dropconnect,” in International conference on machine learning, pp. 1058–1066, PMLR, 2013.
  40. Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” in Proceedings of the IEEE international conference on computer vision, pp. 2736–2744, 2017.
  41. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, p. 1929–1958, 2014.

Summary

We haven't generated a summary for this paper yet.