Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Disentangling Linear Mode-Connectivity (2312.09832v1)

Published 15 Dec 2023 in cs.LG

Abstract: Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes. While empirically well established, it unfortunately still lacks a proper theoretical understanding. Even worse, although empirical data points are abound, a systematic study of when networks exhibit LMC is largely missing in the literature. In this work we aim to close this gap. We explore how LMC is affected by three factors: (1) architecture (sparsity, weight-sharing), (2) training strategy (optimization setup) as well as (3) the underlying dataset. We place particular emphasis on minimal but non-trivial settings, removing as much unnecessary complexity as possible. We believe that our insights can guide future theoretical works on uncovering the inner workings of LMC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Git Re-Basin: Merging Models modulo Permutation Symmetries, March 2023. URL http://arxiv.org/abs/2209.04836.
  2. Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling. In Proceedings of the 38th International Conference on Machine Learning, pages 769–779. PMLR, July 2021. URL https://proceedings.mlr.press/v139/benton21a.html.
  3. Random initialisations performing above chance and how to find them, November 2022. URL http://arxiv.org/abs/2209.07509.
  4. Towards Democratizing Joint-Embedding Self-Supervised Learning, March 2023. URL http://arxiv.org/abs/2303.01986.
  5. On the Relationship between Self-Attention and Convolutional Layers. 2020. URL https://arxiv.org/abs/1911.03584.
  6. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. URL http://arxiv.org/abs/2010.11929.
  7. Essentially No Barriers in Neural Network Energy Landscape. In Proceedings of the 35th International Conference on Machine Learning, pages 1309–1318. PMLR, July 2018. URL https://proceedings.mlr.press/v80/draxler18a.html.
  8. The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks, July 2022. URL https://arxiv.org/abs/2110.06296.
  9. Linear Mode Connectivity and the Lottery Ticket Hypothesis. In Proceedings of the 37th International Conference on Machine Learning, pages 3259–3269. PMLR, November 2020. URL https://proceedings.mlr.press/v119/frankle20a.html.
  10. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/be3087e74e9100d4bc4c6268cdbe8456-Abstract.html.
  11. Linear Connectivity Reveals Generalization Strategies, January 2023. URL http://arxiv.org/abs/2205.12411.
  12. FFCV: Accelerating Training by Removing Data Bottlenecks, June 2023. URL http://arxiv.org/abs/2306.12517.
  13. Mechanistic Lens on Mode Connectivity. 2023. URL https://arxiv.org/abs/2211.08422.
  14. Behnam Neyshabur. Towards Learning Convolutions from Scratch, July 2020. URL http://arxiv.org/abs/2007.13657.
  15. What is being transferred in transfer learning? arXiv:2008.11687 [cs, stat], January 2021. URL http://arxiv.org/abs/2008.11687.
  16. Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances. In Proceedings of the 38th International Conference on Machine Learning, pages 9722–9732. PMLR, July 2021. URL https://proceedings.mlr.press/v139/simsek21a.html.
  17. What can linear interpolation of neural network loss landscapes tell us?, February 2022. URL http://arxiv.org/abs/2106.16004.
  18. Symmetries, Flat Minima And The Conserved Quantities Of Gradient Flow. 2023. URL https://arxiv.org/abs/2210.17216.
  19. Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity, July 2023. URL http://arxiv.org/abs/2307.08286.
Citations (5)

Summary

We haven't generated a summary for this paper yet.