Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds (2410.21702v2)
Abstract: In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.
- Alquier, P. User-friendly introduction to PAC-Bayes bounds. Foundations and Trends® in Machine Learning 17, 2 (2024), 174–303.
- Prediction of time series by statistical learning: general losses and fast rates. Dependence Modeling 1, 2013 (2013), 65–93.
- Model selection for weakly dependent time series forecasting. Bernoulli 18, 3 (2012), 883–913.
- PAC-Bayes bounds on variational tempered posteriors for Markov models. Entropy 23, 3 (2021), 313.
- Convexity, classification, and risk bounds. Journal of the American Statistical Association 101, 473 (2006), 138–156.
- Deep learning. MIT press Cambridge, MA, USA, 2017.
- Castillo, I. Bayesian nonparametric statistics, St-Flour lecture notes. arXiv preprint arXiv:2402.16422 (2024).
- Posterior and variational inference for deep neural networks with heavy-tailed weights. arXiv preprint arXiv:2406.03369 (2024).
- Deep Horseshoe Gaussian Processes. arXiv preprint arXiv:2403.01737 (2024).
- Catoni, O. PAC-Bayesian supervised classification: The thermodynamics of statistical learning. Institute of Mathematical Statistics Lecture Notes – Monograph Series, 56. Institute of Mathematical Statistics, Beachwood, OH, 2007.
- Chérief-Abdellatif, B.-E. Convergence rates of variational inference in sparse deep learning. In International Conference on Machine Learning (2020), PMLR, pp. 1831–1842.
- A PAC-Bayes bound for deterministic classifiers. arXiv preprint arXiv:2209.02525 (2022).
- Wide stochastic networks: Gaussian limit and PAC-Bayesian training. In International Conference on Algorithmic Learning Theory (2023), PMLR, pp. 447–470.
- Markov Chains. Springer, 2018.
- Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (2017).
- Hold-out estimates of prediction models for Markov processes. Statistics 57, 2 (2023), 458–481.
- Generalization bounds: Perspectives from information theory and PAC-Bayes. arXiv preprint arXiv:2309.04381 (2023).
- Mixing time estimation in reversible Markov chains from a single sample path. The Annals of Applied Probability 29, 4 (2019), 2439 – 2480.
- Kengne, W. Excess risk bound for deep learning under weak dependence. arXiv preprint arXiv:2302.07503 (2023).
- Penalized deep neural networks estimator with general loss functions under weak dependence. arXiv preprint arXiv:2305.06230 (2023).
- Deep learning for ψ𝜓\psiitalic_ψ-weakly dependent processes. Journal of Statistical Planning and Inference (2024), 106163.
- Sparse-penalized deep neural networks estimator under weak dependence. Metrika (2024), 1–32.
- An optimization-centric view on Bayes’ rule: Reviewing and generalizing variational inference. Journal of Machine Learning Research 23, 132 (2022), 1–109.
- On the rate of convergence of a deep recurrent neural network estimate in a regression problem with dependent data. Bernoulli 29, 2 (2023), 1663–1685.
- Adaptive deep learning for nonparametric time series regression. arXiv preprint arXiv:2207.02546 (2022).
- Estimating the spectral gap of a reversible markov chain from a short trajectory. arXiv preprint arXiv:1612.05330 (2016).
- Mai, T. T. Misclassification bounds for PAC-Bayesian sparse deep learning. arXiv preprint arXiv:2405.01304 (2024).
- McAllester, D. A. Some PAC-Bayesian theorems. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory (New York, 1998), ACM, pp. 230–234.
- Smooth function approximation by deep neural networks with general activation functions. Entropy 21, 7 (2019), 627.
- Nonconvex sparse regularization for deep neural networks and its optimality. Neural Computation 34, 2 (2022), 476–517.
- Paulin, D. Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic Journal of Probability 20 (2015), 1 – 32.
- Tighter risk certificates for neural networks. The Journal of Machine Learning Research 22, 1 (2021), 10326–10365.
- Posterior concentration for sparse deep learning. Advances in Neural Information Processing Systems 31 (2018).
- Schmidt-Hieber, J. Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics 48, 4 (2020), 1875–1897.
- PAC-Bayes training for neural networks: sparsity and uncertainty quantification. arXiv preprint arXiv:2204.12392 (2022).
- Learning sparse deep neural networks with a spike-and-slab prior. Statistics & probability letters 180 (2022), 109246.
- Suzuki, T. Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality. 7th International Conference on Learning Representations (ICLR) (2019).
- Tsybakov, A. B. Introduction to nonparametric estimation. Springer Ser. Stat. New York, NY: Springer, 2009.
- Improved estimation of relaxation time in nonreversible Markov chains. The Annals of Applied Probability 34, 1A (2024), 249–276.
- Classification with deep neural networks and logistic loss. Journal of Machine Learning Research 25, 125 (2024), 1–117.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.