Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression (2404.04498v2)
Abstract: The remarkable generalization performance of large-scale models has been challenging the conventional wisdom of the statistical learning theory. Although recent theoretical studies have shed light on this behavior in linear models and nonlinear classifiers, a comprehensive understanding of overparameterization in nonlinear regression models is still lacking. This study explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the methodology of the adaptive prior considering the intrinsic spectral structure of the data. Posterior contraction is established for generalized linear and single-neuron models with Lipschitz continuous activation functions, demonstrating the consistency in the predictions of the proposed approach. Moreover, the Bayesian framework enables uncertainty estimation of the predictions. The proposed method was validated via numerical simulations and a real data application, showing its ability to achieve accurate predictions and reliable uncertainty estimates. This work provides a theoretical understanding of the advantages of overparameterization and a principled Bayesian approach to large nonlinear models.
- Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
- Bayesian inference in high-dimensional models. arXiv preprint arXiv:2101.04491, 2021.
- Pyro: Deep universal probabilistic programming. Journal of Machine Learning Research, 20(28):1–6, 2019.
- Weight uncertainty in neural network. In International conference on machine learning, pages 1613–1622. PMLR, 2015.
- Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
- Benign overfitting in two-layer convolutional neural networks. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 01 2018.
- Bayesian variable selection and computation for generalized linear models with conjugate priors. Bayesian analysis (Online), 3(3):585, 2008.
- Variable selection using nonlocal priors in high-dimensional generalized linear models with application to fmri data analysis. Entropy, 22(8):807, 2020.
- The interplay between implicit bias and benign overfitting in two-layer linear networks. Journal of Machine Learning Research, 23(263):1–48, 2022.
- Dimension free ridge regression. arXiv preprint arXiv:2210.08571, 2022.
- Handling sparsity via the horseshoe. In Artificial Intelligence and Statistics, pages 73–80. PMLR, 2009.
- Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data. In Conference on Learning Theory, pages 2668–2703. PMLR, 2022.
- Peter I Frazier. A tutorial on bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
- Bayesian Data Analysis. Chapman and Hall/CRC, 3 edition, 2013.
- Arcene. UCI Machine Learning Repository, 2008. DOI: https://doi.org/10.24432/C58P55.
- Convergence rates of posterior distributions. Annals of Statistics, pages 500–531, 2000.
- Subhashis Ghosal. Normal approximation to the posterior distribution for generalized linear models with many covariates. Mathematical Methods of Statistics, 6(3):332–348, 1997.
- Alex Graves. Practical variational inference for neural networks. Advances in neural information processing systems, 24, 2011.
- Subhashis Ghosal and Aad van der Vaart. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
- Stochastic variational inference. Journal of Machine Learning Research, 2013.
- Seonghyun Jeong. Posterior contraction in group sparse logit models for categorical responses. Journal of Statistical Planning and Inference, 219:266–278, 2022.
- Posterior contraction in sparse generalized linear models. Biometrika, 108(2):367–379, 2021.
- Wenxin Jiang. Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. The Annals of Statistics, 35(4):1487 – 1511, 2007.
- Adam: Amethod for stochastic optimization. In The Third International Conference on Learning Representations, pages 1–15, 2014.
- Benign overfitting in two-layer relu convolutional neural networks. In International Conference on Machine Learning, pages 17615–17659. PMLR, 2023.
- Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Auto-encoding variational bayes. In The Third International Conference on Learning Representations, 2014.
- Uniform convergence of interpolators: Gaussian width, norm bounds and benign overfitting. In Advances in Neural Information Processing Systems, volume 34. Curran Associates, Inc., 2021.
- Bayesian group selection in logistic regression with application to mri data analysis. Biometrics, 77(2):391–400, 2021.
- Just interpolate: Kernel “ridgeless” regression can generalize. The Annals of Statistics, 48(3):1329–1347, 2020.
- On the multiple descent of minimum-norm interpolants and restricted lower isometry of kernels. In Conference on Learning Theory, pages 2683–2711. PMLR, 2020.
- Benign overfitting and noisy features. Journal of the American Statistical Association, 118(544):2876–2888, 2023.
- Bayesian subset modeling for high-dimensional generalized linear models. Journal of the American Statistical Association, 108(502):589–606, 2013.
- Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Minimum ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT -norm interpolators: Precise asymptotics and multiple descent. arXiv preprint arXiv:2110.09502, 2021.
- Peter McCullagh. Generalized Linear Models. Routledge, 2 edition, 2019.
- Integration of Survival and Binary Data for Variable Selection and Prediction: A Bayesian Approach. Journal of the Royal Statistical Society Series C: Applied Statistics, 68(5):1577–1595, 09 2019.
- Six lectures on linearized neural networks. arXiv preprint arXiv:2308.13431, 2023.
- Benign overfitting in time series linear model with over-parameterization. arXiv preprint arXiv:2204.08369, 2022.
- Bayesian linear regression for multivariate responses under group sparsity. Bernoulli, 26(3):2353–2382, 2020.
- Skinny gibbs: A consistent and scalable gibbs sampler for model selection. Journal of the American Statistical Association, 2018.
- The bayesian lasso. Journal of the American Statistical Association, 103(482):681–686, 2008.
- Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64, 2021.
- Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1530–1538, Lille, France, 07–09 Jul 2015. PMLR.
- Stochastic backpropagation and approximate inference in deep generative models. In International conference on machine learning, pages 1278–1286. PMLR, 2014.
- Ohad Shamir. The implicit bias of benign overfitting. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 448–478. PMLR, 02–05 Jul 2022.
- Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
- Constrained inference through posterior projections. arXiv preprint arXiv:1812.05741, 2022.
- Optimal criterion for feature learning of two-layer linear neural network in high dimensional interpolation regime. In The Twelfth International Conference on Learning Representations, 2024.
- Empirical bayes scaling of gaussian priors in the white noise model. Electronic Journal of Statistics, 7:991–1018, 2013.
- Benign overfitting in ridge regression. Journal of Machine Learning Research, 24(123):1–76, 2023.
- Benign overfitting of non-sparse high-dimensional linear regression with correlated noise. arXiv preprint arXiv:2304.04037, 2023.
- Empirical bayes inference in sparse high-dimensional generalized linear models. arXiv preprint arXiv:2303.07854, 2023.
- Vladimir N Vapnik. An overview of statistical learning theory. IEEE transactions on neural networks, 10(5):988–999, 1999.
- Guenther Walther. Detecting the presence of mixing with multiscale maximum likelihood. Journal of the American Statistical Association, 97(458):508–513, 2002.
- Guenther Walther. Inference and modeling with log-concave distributions. Statistical Science, pages 319–327, 2009.
- Tight bounds for minimum ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm interpolation of noisy data. In Gustau Camps-Valls, Francisco J. R. Ruiz, and Isabel Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 10572–10602. PMLR, 28–30 Mar 2022.
- Bayesian analysis for over-parameterized linear model without sparsity. arXiv preprint arXiv:2305.15754, 2023.
- Learning a single neuron with gradient methods. In Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 3756–3786. PMLR, 09–12 Jul 2020.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.