Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nonparametric logistic regression with deep learning (2401.12482v1)

Published 23 Jan 2024 in math.ST, stat.ML, and stat.TH

Abstract: Consider the nonparametric logistic regression problem. In the logistic regression, we usually consider the maximum likelihood estimator, and the excess risk is the expectation of the Kullback-Leibler (KL) divergence between the true and estimated conditional class probabilities. However, in the nonparametric logistic regression, the KL divergence could diverge easily, and thus, the convergence of the excess risk is difficult to prove or does not hold. Several existing studies show the convergence of the KL divergence under strong assumptions. In most cases, our goal is to estimate the true conditional class probabilities. Thus, instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator in some suitable metric. In this paper, using a simple unified approach for analyzing the nonparametric maximum likelihood estimator (NPMLE), we directly derive the convergence rates of the NPMLE in the Hellinger distance under mild assumptions. Although our results are similar to the results in some existing studies, we provide simple and more direct proofs for these results. As an important application, we derive the convergence rates of the NPMLE with deep neural networks and show that the derived rate nearly achieves the minimax optimal rate.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. On deep learning as a remedy for the curse of dimensionality in nonparametric regression. Annals of Statistics, 47:2261 – 2285, 2019.
  2. Minimax rates for conditional density estimation via empirical entropy. Annals of Statistics, 51(2):762 – 790, 2023.
  3. Convergence rates of deep ReLU networks for multiclass classification. Electronic Journal of Statistics, 16:2724 – 2773, 2022.
  4. Nonparametric regression on low-dimensional manifolds using deep relu networks: function approximation and statistical recovery. Information and Inference: A Journal of the IMA, 11:1203–1253, 2022.
  5. Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21:C1–C68, 2018.
  6. A comparison of deep networks with relu activation function and linear spline-type methods. Neural Networks, 110:232–242, 2019.
  7. On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces. Neural Networks, 123:343–361, 2020.
  8. Sharp rate of convergence for deep neural network classifiers under the teacher-student setting. arXiv, 2020.
  9. Deep neural networks learn non-smooth functions effectively. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89, pages 869–878. PMLR, 2019.
  10. Fast convergence rates of deep neural networks for classification. Neural Networks, 138:179–197, 2021.
  11. Statistical theory for image classification using deep convolutional neural networks with cross-entropy loss. arXiv, 11 2020.
  12. On the rate of convergence of image classifiers based on convolutional neural networks. Annals of the Institute of Statistical Mathematics, 74(6):1085–1108, 2022.
  13. Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99:67–81, 2004.
  14. Adaptive approximation and generalization of deep neural network with intrinsic dimensionality. Journal of Machine Learning Research, 21(174):1–38, 2020.
  15. Nonconvex sparse regularization for deep neural networks and its optimality. Neural computation, 34:476–517, 2022.
  16. Johannes Schmidt-Hieber. Deep relu network approximation of functions on a manifold. ArXiv, 2019.
  17. Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with ReLU activation function. Annals of Statistics, 48:1875 – 1897, 2020.
  18. Support vector machines. Springer Science & Business Media, 2008.
  19. Taiji Suzuki. Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: optimal rate and curse of dimensionality. In International Conference on Learning Representation, 2019.
  20. Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic besov space. In Advances in Neural Information Processing Systems, volume 34, pages 3609–3621, 2021.
  21. Alexander B. Tsybakov. Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32(1):135–166, 2004.
  22. Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer Publishing Company, Incorporated, 2008.
  23. Sara van de Geer. Empirical Processes in M-estimation. Cambridge university press, 2000.
  24. Probability Inequalities for Likelihood Ratios and Convergence Rates of Sieve MLES. Annals of Statistics, 23(2):339 – 362, 1995.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Atsutomo Yara (1 paper)
  2. Yoshikazu Terada (20 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com