2000 character limit reached
A Two-Scale Complexity Measure for Deep Learning Models (2401.09184v3)
Published 17 Jan 2024 in stat.ML and cs.LG
Abstract: We introduce a novel capacity measure 2sED for statistical models based on the effective dimension. The new quantity provably bounds the generalization error under mild assumptions on the model. Furthermore, simulations on standard data sets and popular model architectures show that 2sED correlates well with the training error. For Markovian models, we show how to efficiently approximate 2sED from below through a layerwise iterative approach, which allows us to tackle deep learning models with a large number of parameters. Simulation results suggest that the approximation is good for different prominent models and data sets.
- Effective dimension of machine learning models. arXiv:2112.04807, 2021.
- The power of quantum neural networks. Nature Computational Science, 1(6):403–409, 2021.
- A scale-dependent notion of effective dimension. arXiv:2001.10872, 2020.
- Classification and regression trees (crc, boca raton, fl). 1984.
- Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks. The Journal of Machine Learning Research, 20(1):2285–2301, 2019.
- Jock Blackard. Covertype. UCI Machine Learning Repository, 1998. DOI: https://doi.org/10.24432/C50K5N.
- Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- On the complexity of logistic regression models. Neural computation, 31(8):1592–1623, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Model complexity control for regression using vc generalization bounds. IEEE transactions on Neural Networks, 10(5):1075–1089, 1999.
- Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- Peter D Grünwald. The minimum description length principle. MIT press, 2007.
- Model complexity of deep learning: A survey. Knowledge and Information Systems, 63:2585–2619, 2021.
- The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
- Universal statistics of fisher information in deep neural networks: Mean field approach. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1032–1041. PMLR, 2019.
- An experimental and theoretical comparison of model selection methods. In Proceedings of the Eighth Annual Conference on Computational Learning Theory, pages 21–30, 1995.
- Cifar-10 (canadian institute for advanced research).
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
- On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. Advances in neural information processing systems, 21, 2008.
- Deep learning. nature, 521(7553):436–444, 2015.
- Lipschitzness effect of a loss function on generalization performance of deep neural networks trained by adam and adamw optimizers. arXiv preprint arXiv:2303.16464, 2023.
- Fisher-rao metric, geometry, and complexity of neural networks. In The 22nd international conference on artificial intelligence and statistics, pages 888–896. PMLR, 2019.
- Optimizing neural networks with Kronecker-factored approximate curvature. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 2408–2417. JMLR.org, 2015.
- On the number of linear regions of deep neural networks. Advances in neural information processing systems, 27, 2014.
- In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614, 2014.
- Generalization bounds for deep learning. CoRR, vol. abs/2012.04115, 2020.
- Jorma J Rissanen. Fisher information and stochastic complexity. IEEE transactions on information theory, 42(1):40–47, 1996.
- Jorma Rissanen. Stochastic complexity in learning. journal of computer and system sciences, 55(1):89–95, 1997.
- On the expressive power of deep neural networks. In international conference on machine learning, pages 2847–2854. PMLR, 2017.
- Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
- Measuring the vc-dimension of a learning machine. Neural computation, 6(5):851–876, 1994.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.