Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do Deep Neural Network Solutions Form a Star Domain? (2403.07968v2)

Published 12 Mar 2024 in cs.LG and cs.AI

Abstract: It has recently been conjectured that neural network solution sets reachable via stochastic gradient descent (SGD) are convex, considering permutation invariances (Entezari et al., 2022). This means that a linear path can connect two independent solutions with low loss, given the weights of one of the models are appropriately permuted. However, current methods to test this theory often require very wide networks to succeed. In this work, we conjecture that more generally, the SGD solution set is a "star domain" that contains a "star model" that is linearly connected to all the other solutions via paths with low loss values, modulo permutations. We propose the Starlight algorithm that finds a star model of a given learning task. We validate our claim by showing that this star model is linearly connected with other independently found solutions. As an additional benefit of our study, we demonstrate better uncertainty estimates on the Bayesian Model Averaging over the obtained star domain. Further, we demonstrate star models as potential substitutes for model ensembles. Our code is available at https://github.com/aktsonthalia/starlight.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Git Re-Basin: Merging Models modulo Permutation Symmetries, December 2022. URL http://arxiv.org/abs/2209.04836. arXiv:2209.04836 [cs].
  2. Disentangling Linear Mode-Connectivity, December 2023. URL http://arxiv.org/abs/2312.09832. arXiv:2312.09832 [cs].
  3. Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling. In Proceedings of the 38th International Conference on Machine Learning, pp.  769–779. PMLR, July 2021. URL https://proceedings.mlr.press/v139/benton21a.html. ISSN: 2640-3498.
  4. Random initialisations performing above chance and how to find them, November 2022. URL http://arxiv.org/abs/2209.07509. arXiv:2209.07509 [cs].
  5. Weight uncertainty in neural network. In International conference on machine learning, pp.  1613–1622. PMLR, 2015.
  6. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  7. Essentially no barriers in neural network energy landscape. In International conference on machine learning, pp.  1309–1318. PMLR, 2018.
  8. The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks, July 2022. arXiv:2110.06296 [cs].
  9. Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning, pp.  3259–3269. PMLR, 2020.
  10. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp.  1050–1059. PMLR, 2016.
  11. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs, October 2018. URL http://arxiv.org/abs/1802.10026. arXiv:1802.10026 [cs, stat].
  12. Using mode connectivity for loss landscape analysis. arXiv preprint arXiv:1806.06977, 2018.
  13. Re-basin via implicit Sinkhorn differentiation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  20237–20246, Vancouver, BC, Canada, June 2023. IEEE. ISBN 9798350301298. doi: 10.1109/CVPR52729.2023.01938. URL https://ieeexplore.ieee.org/document/10203740/.
  14. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  770–778, Las Vegas, NV, USA, June 2016. IEEE. ISBN 978-1-4673-8851-1. doi: 10.1109/CVPR.2016.90. URL http://ieeexplore.ieee.org/document/7780459/.
  15. Densely Connected Convolutional Networks, January 2018. URL http://arxiv.org/abs/1608.06993. arXiv:1608.06993 [cs].
  16. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015. URL http://arxiv.org/abs/1502.03167.
  17. Repair: Renormalizing permuted activations for interpolation repair, 2023.
  18. Linear connectivity reveals generalization strategies. arXiv preprint arXiv:2205.12411, 2022.
  19. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  20. Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets. In Advances in Neural Information Processing Systems, volume 32. NeurIPS, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/46a4378f835dc8040c8057beb6a2da52-Abstract.html.
  21. Mechanistic Mode Connectivity. In Proceedings of the 40th International Conference on Machine Learning, pp.  22965–23004. PMLR, July 2023. URL https://proceedings.mlr.press/v202/lubana23a.html. ISSN: 2640-3498.
  22. Linear Mode Connectivity in Multitask and Continual Learning, October 2020. URL http://arxiv.org/abs/2010.04495. arXiv:2010.04495 [cs].
  23. Model Fusion via Optimal Transport, February 2021. URL http://arxiv.org/abs/1910.05653. arXiv:1910.05653 [cs, stat].
  24. Exploring diversified adversarial robustness in neural networks via robust mode connectivity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2345–2351, 2023.
  25. Optimizing Mode Connectivity for Class Incremental Learning. In Proceedings of the 40th International Conference on Machine Learning, pp.  36940–36957. PMLR, July 2023. URL https://proceedings.mlr.press/v202/wen23b.html. ISSN: 2640-3498.
  26. Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness, July 2020. URL http://arxiv.org/abs/2005.00060. arXiv:2005.00060 [cs, stat].

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com