Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Margin-based Multiclass Generalization Bound via Geometric Complexity (2405.18590v1)

Published 28 May 2024 in stat.ML and cs.LG

Abstract: There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Emergence of invariance and disentanglement in deep representations. In 2018 Information Theory and Applications Workshop (ITA), pp.  1–9, 2018. doi: 10.1109/ITA.2018.8503149.
  2. A simple proof of the poincaré inequality for a large class of probability measures. 2008.
  3. Analysis and geometry of Markov diffusion operators, volume 103. Springer, 2014.
  4. Implicit gradient regularization. In International Conference on Learning Representations, 2021.
  5. Spectrally-normalized margin bounds for neural networks. Advances in neural information processing systems, 30, 2017.
  6. Differentiability and poincaré-type inequalities in metric measure spaces. Advances in Mathematics, 333:868–930, 2018.
  7. Belkin, M. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation. 2021. URL https://arxiv.org/abs/2105.14368.
  8. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), 2019.
  9. Poincaré’s inequalities and talagrand’s concentration phenomenon for the exponential distribution. Probability Theory and Related Fields, 107:383–400, 1997.
  10. Concentration inequalities. Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003, Revised Lectures, pp.  208–240, 2004.
  11. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
  12. Local and non-local poincaré inequalities on lie groups. Bulletin of the London Mathematical Society, 54(6):2162–2173, 2022.
  13. A universal law of robustness via isoperimetry. Journal of the ACM, 70(2):1–18, 2023.
  14. The intriguing role of module criticality in the generalization of deep networks. ArXiv, abs/1912.00528, 2019.
  15. Chavel, I. Eigenvalues in Riemannian geometry. Academic press, 1984.
  16. Cheeger, J. Differentiability of lipschitz functions on metric measure spaces. Geometric & Functional Analysis GAFA, 9:428–517, 1999.
  17. Differentiability of lipschitz maps from metric measure spaces to banach spaces with the radon–nikodym property. Geometric and Functional Analysis, 19(4):1017, 2009.
  18. The geometric occam’s razor implicit in deep learning. arXiv preprint arXiv:2111.15090, 2021.
  19. Why neural networks find simple solutions: the many regularizers of geometric complexity. arXiv preprint arXiv:2209.13083, 2022.
  20. Dudley, R. M. The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1(3):290–330, 1967.
  21. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
  22. Evans, L. C. Partial differential equations, volume 19. American Mathematical Society, 2022.
  23. Federer, H. Geometric measure theory. Springer, 2014.
  24. A remark on optimal weighted poincaré inequalities for convex domains. Rendiconti Lincei, 23(4):467–475, 2012.
  25. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=6Tm1mposlrM.
  26. Deep double descent via smooth interpolation. arXiv preprint arXiv:2209.10080, 2022.
  27. Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=ZzdBhtEH9yB.
  28. Predicting generalization with degrees of freedom in neural networks. In ICML 2022 2nd AI for Science Workshop, 2022.
  29. Sobolev met poincaré, volume 688. American Mathematical Soc., 2000.
  30. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  31. Hebey, E. Nonlinear analysis on manifolds: Sobolev spaces and inequalities: Sobolev spaces and inequalities, volume 5. American Mathematical Soc., 2000.
  32. Heinonen, J. et al. Lectures on analysis on metric spaces. Springer Science & Business Media, 2001.
  33. Fantastic generalization measures and where to find them. arXiv preprint arXiv:1912.02178, 2019.
  34. Learning multiple layers of features from tiny images. 2009.
  35. Pac-bayes & margins. Advances in neural information processing systems, 15, 2002.
  36. Ledoux, M. The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2001.
  37. Generalization bounds for deep convolutional neural networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1e_FpNFDr.
  38. Weak curvature conditions and functional inequalities. Journal of Functional Analysis, 245(1):311–333, 2007.
  39. The sobolev regularization effect of stochastic gradient descent. 2021. URL https://arxiv.org/abs/2105.13462.
  40. McAllester, D. A. Pac-bayesian model averaging. In Proceedings of the twelfth annual conference on Computational learning theory, pp.  164–170, 1999.
  41. Mendelson, S. Geometric methods in the analysis of glivenko-cantelli classes. In Computational Learning Theory: 14th Annual Conference on Computational Learning Theory, COLT 2001 and 5th European Conference on Computational Learning Theory, EuroCOLT 2001 Amsterdam, The Netherlands, July 16–19, 2001 Proceedings, pp.  256–272. Springer, 2001.
  42. Foundations of machine learning. MIT press, 2018.
  43. Uniform convergence may be unable to explain generalization in deep learning. Advances in Neural Information Processing Systems, 32, 2019.
  44. Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12):124003, 2021.
  45. Neyshabur, B. Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953, 2017.
  46. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564, 2017.
  47. Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations, 2018.
  48. Concentration of measure inequalities in information theory, communications, and coding. Foundations and Trends® in Communications and Information Theory, 10(1-2):1–246, 2013.
  49. Schlichting, A. Poincaré and log–sobolev inequalities for mixtures. Entropy, 21(1):89, 2019.
  50. On the origin of implicit regularization in stochastic gradient descent. In International Conference on Learning Representations, 2021.
  51. Robust large margin deep neural networks. IEEE Transactions on Signal Processing, 65:4265–4280, 2017.
  52. Tomczak-Jaegermann, N. Banach-Mazur distances and finite-dimensional operator ideals, volume 38. Longman Sc & Tech, 1989.
  53. „understanding deep learning requires rethinking generalization “, iclr 2017. arXiv preprint arXiv:1611.03530, 2017.
  54. Zhang, T. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5(Oct):1225–1251, 2004.
  55. Gradient norm regularizer seeks flat minima and improves generalization, 2023. URL https://openreview.net/forum?id=z4eslwuymzQ.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com