Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalisation
Abstract: To achieve near-zero training error in a classification problem, the layers of a feed-forward network have to disentangle the manifolds of data points with different labels, to facilitate the discrimination. However, excessive class separation can bring to overfitting since good generalisation requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimisation dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across data sets and architectures) increases the class entanglement.The training error at the inversion is stable under subsampling, and across network initialisations and optimisers, which characterises it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set, coined ``stragglers'', particularly influential for generalisation.
- A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit. Nature Machine Intelligence, 5(12):1497–1507, 2023.
- Linear classification of neural manifolds with correlated variability. Phys. Rev. Lett., 131:027301, Jul 2023.
- How deep neural networks learn compositional data: The random hierarchy model, 2023.
- Activity–weight duality in feed-forward neural networks reveals two co-determinants for generalization. Nature Machine Intelligence, 5(8):908–918, 2023.
- Learning through atypical phase transitions in overparameterized neural networks. Phys. Rev. E, 106:014116, Jul 2022.
- Data-driven emergence of convolutional structure in neural networks. Proceedings of the National Academy of Sciences, 119(40):e2201854119, 2022.
- High-dimensional dynamics of generalization error in neural networks. Neural Networks, 132:428–446, 2020.
- Modeling the influence of data structure on learning in neural networks: The hidden manifold model. Phys. Rev. X, 10:041044, Dec 2020.
- Marc Mézard. Mean-field message-passing equations in the hopfield model and its generalizations. Phys. Rev. E, 95:022117, Feb 2017.
- Exploring generalization in deep learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 5949–5958, Red Hook, NY, USA, 2017. Curran Associates Inc.
- Understanding deep learning requires rethinking generalization. 2016. cite arxiv:1611.03530Comment: Published in ICLR 2017.
- Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior. arXiv:1710.09553 [cs.LG], 2017.
- Supervised contrastive learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 18661–18673. Curran Associates, Inc., 2020.
- Semi-supervised learning via compact latent space clustering. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2459–2468. PMLR, 10–15 Jul 2018.
- Deep metric learning using triplet network. In Aasa Feragen, Marcello Pelillo, and Marco Loog, editors, Similarity-Based Pattern Recognition, pages 84–92, Cham, 2015. Springer International Publishing.
- Learning a nonlinear embedding by preserving class neighbourhood structure. In Marina Meila and Xiaotong Shen, editors, Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, volume 2 of Proceedings of Machine Learning Research, pages 412–419, San Juan, Puerto Rico, 21–24 Mar 2007. PMLR.
- Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 539–546 vol. 1, 2005.
- Quantifying the separability of data classes in neural networks. Neural Networks, 139:278–293, 2021.
- Classification and geometry of general perceptual manifolds. Phys. Rev. X, 8:031003, 2018.
- Motor cortex embeds muscle-like commands in an untangled population response. Neuron, 97(4), 2018.
- Optimal architectures in a solvable model of deep networks. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Signals in inferotemporal and perirhinal cortex suggest an untangling of visual target information. Nature Neuroscience, 16:1132+, 2022/9/6/ 2013.
- Untangling invariant object recognition. Trends in Cognitive Sciences, 11(8):333–341, 2007.
- Gradient-based learning drives robust representations in recurrent neural networks by balancing compression and expansion. Nature Machine Intelligence, 4(6):564–573, 2022.
- Separability and geometry of object manifolds in deep neural networks. Nature Communications, 11(1):746, 2020.
- Intrinsic dimension of data representations in deep neural networks. In Advances in Neural Information Processing Systems 32, 2019.
- Recurrent neural networks learn robust representations by dynamically balancing compression and expansion. In Real Neurons & Hidden Units: Future directions at the intersection of neuroscience and artificial intelligence @ NeurIPS 2019, 2019.
- Dimensionality compression and expansion in deep neural networks, 06 2019.
- Exponential expressivity in deep neural networks through transient chaos. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Analyzing and improving representations with the soft nearest neighbor loss. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2012–2020. PMLR, 09–15 Jun 2019.
- Where is the information in a deep neural network? CoRR, abs/1905.12213, 2019.
- Emergence of invariance and disentanglement in deep representations. In 2018 Information Theory and Applications Workshop (ITA), pages 1–9, 2018.
- Opening the black box of deep neural networks via information. CoRR, abs/1703.00810, 2017.
- Yoshua Bengio. Deep learning of representations: Looking forward. In Adrian-Horia Dediu, Carlos MartÃn-Vide, Ruslan Mitkov, and Bianca Truthe, editors, Statistical Language and Speech Processing, pages 1–37, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
- Lenka Zdeborová. Understanding deep learning is also a job for physicists. Nature Physics, 16(6):602–604, May 2020.
- Marco Gherardi. Solvable model for the linear separability of structured data. Entropy, 23(3), 2021.
- Marc Mézard. Spin glass theory and its new challenge: structured disorder, 2023.
- Counting the learnable functions of geometrically structured data. Phys. Rev. Res., 2:023169, May 2020.
- Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019.
- Deep double descent: where bigger models and more data hurt*. Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, dec 2021.
- A closer look at memorization in deep networks. In International conference on machine learning, pages 233–242. PMLR, 2017.
- Exact solutions to the nonlinear dynamics of learning in deep linear neural network. In In International Conference on Learning Representations, 2014.
- Intrinsic dimension estimation for locally undersampled data. Scientific Reports, 9(1):17133, 2019.
- Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7(1):12140, 2017.
- Measuring the intrinsic dimension of objective landscapes. In International Conference on Learning Representations, 2018.
- Beyond the storage capacity: Data-driven satisfiability transition. Phys. Rev. Lett., 125:120601, Sep 2020.
- Statistical learning theory of structured data. Phys. Rev. E, 102:032119, 2020.
- Measuring logic complexity can guide pattern discovery in empirical systems. Complexity, 21(S2):397–408, 2016.
- Disentangling feature and lazy training in deep neural networks. Journal of Statistical Mechanics: Theory and Experiment, 2020(11):113301, nov 2020.
- Statistics of shared components in complex component systems. Phys. Rev. X, 8:021023, 2018.
- Zipf and heaps laws from dependency structures in component systems. Phys. Rev. E, 98:012315, Jul 2018.
- Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
- Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
- MNIST handwritten digit database. 2010.
- Deep learning for classical japanese literature, 2018. cite arxiv:1812.01718Comment: To appear at Neural Information Processing Systems 2018 Workshop on Machine Learning for Creativity and Design.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
- Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.
- J.L. Cardy. Finite-size Scaling. Current physics. North-Holland, 1988.
- Marco Gherardi. Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalisation. DOI: 10.5281/zenodo.8355859, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.