A Margin-based Multiclass Generalization Bound via Geometric Complexity (2405.18590v1)
Abstract: There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels.
- Emergence of invariance and disentanglement in deep representations. In 2018 Information Theory and Applications Workshop (ITA), pp. 1–9, 2018. doi: 10.1109/ITA.2018.8503149.
- A simple proof of the poincaré inequality for a large class of probability measures. 2008.
- Analysis and geometry of Markov diffusion operators, volume 103. Springer, 2014.
- Implicit gradient regularization. In International Conference on Learning Representations, 2021.
- Spectrally-normalized margin bounds for neural networks. Advances in neural information processing systems, 30, 2017.
- Differentiability and poincaré-type inequalities in metric measure spaces. Advances in Mathematics, 333:868–930, 2018.
- Belkin, M. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. 2021. URL https://arxiv.org/abs/2105.14368.
- Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 2019.
- Poincaré’s inequalities and talagrand’s concentration phenomenon for the exponential distribution. Probability Theory and Related Fields, 107:383–400, 1997.
- Concentration inequalities. Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003, Revised Lectures, pp. 208–240, 2004.
- Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
- Local and non-local poincaré inequalities on lie groups. Bulletin of the London Mathematical Society, 54(6):2162–2173, 2022.
- A universal law of robustness via isoperimetry. Journal of the ACM, 70(2):1–18, 2023.
- The intriguing role of module criticality in the generalization of deep networks. ArXiv, abs/1912.00528, 2019.
- Chavel, I. Eigenvalues in Riemannian geometry. Academic press, 1984.
- Cheeger, J. Differentiability of lipschitz functions on metric measure spaces. Geometric & Functional Analysis GAFA, 9:428–517, 1999.
- Differentiability of lipschitz maps from metric measure spaces to banach spaces with the radon–nikodym property. Geometric and Functional Analysis, 19(4):1017, 2009.
- The geometric occam’s razor implicit in deep learning. arXiv preprint arXiv:2111.15090, 2021.
- Why neural networks find simple solutions: the many regularizers of geometric complexity. arXiv preprint arXiv:2209.13083, 2022.
- Dudley, R. M. The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1(3):290–330, 1967.
- Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
- Evans, L. C. Partial differential equations, volume 19. American Mathematical Society, 2022.
- Federer, H. Geometric measure theory. Springer, 2014.
- A remark on optimal weighted poincaré inequalities for convex domains. Rendiconti Lincei, 23(4):467–475, 2012.
- Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=6Tm1mposlrM.
- Deep double descent via smooth interpolation. arXiv preprint arXiv:2209.10080, 2022.
- Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ZzdBhtEH9yB.
- Predicting generalization with degrees of freedom in neural networks. In ICML 2022 2nd AI for Science Workshop, 2022.
- Sobolev met poincaré, volume 688. American Mathematical Soc., 2000.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Hebey, E. Nonlinear analysis on manifolds: Sobolev spaces and inequalities: Sobolev spaces and inequalities, volume 5. American Mathematical Soc., 2000.
- Heinonen, J. et al. Lectures on analysis on metric spaces. Springer Science & Business Media, 2001.
- Fantastic generalization measures and where to find them. arXiv preprint arXiv:1912.02178, 2019.
- Learning multiple layers of features from tiny images. 2009.
- Pac-bayes & margins. Advances in neural information processing systems, 15, 2002.
- Ledoux, M. The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2001.
- Generalization bounds for deep convolutional neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1e_FpNFDr.
- Weak curvature conditions and functional inequalities. Journal of Functional Analysis, 245(1):311–333, 2007.
- The sobolev regularization effect of stochastic gradient descent. 2021. URL https://arxiv.org/abs/2105.13462.
- McAllester, D. A. Pac-bayesian model averaging. In Proceedings of the twelfth annual conference on Computational learning theory, pp. 164–170, 1999.
- Mendelson, S. Geometric methods in the analysis of glivenko-cantelli classes. In Computational Learning Theory: 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001 Amsterdam, The Netherlands, July 16–19, 2001 Proceedings, pp. 256–272. Springer, 2001.
- Foundations of machine learning. MIT press, 2018.
- Uniform convergence may be unable to explain generalization in deep learning. Advances in Neural Information Processing Systems, 32, 2019.
- Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, 2021.
- Neyshabur, B. Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953, 2017.
- A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564, 2017.
- Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations, 2018.
- Concentration of measure inequalities in information theory, communications, and coding. Foundations and Trends® in Communications and Information Theory, 10(1-2):1–246, 2013.
- Schlichting, A. Poincaré and log–sobolev inequalities for mixtures. Entropy, 21(1):89, 2019.
- On the origin of implicit regularization in stochastic gradient descent. In International Conference on Learning Representations, 2021.
- Robust large margin deep neural networks. IEEE Transactions on Signal Processing, 65:4265–4280, 2017.
- Tomczak-Jaegermann, N. Banach-Mazur distances and finite-dimensional operator ideals, volume 38. Longman Sc & Tech, 1989.
- „understanding deep learning requires rethinking generalization “, iclr 2017. arXiv preprint arXiv:1611.03530, 2017.
- Zhang, T. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5(Oct):1225–1251, 2004.
- Gradient norm regularizer seeks flat minima and improves generalization, 2023. URL https://openreview.net/forum?id=z4eslwuymzQ.