Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the sample complexity of parameter estimation in logistic regression with normal design (2307.04191v4)

Published 9 Jul 2023 in math.ST, cs.IT, cs.LG, math.IT, stat.ML, and stat.TH

Abstract: The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4:384–414, 2010.
  2. Active and passive learning of linear separators under log-concave distributions. In Conference on Learning Theory, pages 288–316, 2013.
  3. 1-bit compressive sensing. In Conference on Information Sciences and Systems, pages 16–21, 2008.
  4. On Bayes risk lower bounds. Journal of Machine Learning Research, 17(1):7687–7744, 2016.
  5. On the Lambert W function. Advances in Computational Mathematics, 5:329–359, 1996.
  6. Lower bounds in pattern recognition and learning. Pattern Recognition, 28(7):1011–1018, 1995.
  7. William Feller. An Introduction to Probability Theory and Its Applications, volume 1. John Wiley & Sons, 3rd edition, 1968.
  8. Logistic regression: The importance of being improper. In Conference on Learning Theory, pages 167–208, 2018.
  9. Generalizing the Fano inequality. IEEE Transactions on Information Theory, 40(4):1247–1251, 1994.
  10. Logistic regression: Tight bounds for stochastic and online optimization. In Conference on Learning Theory, pages 197–209, 2014.
  11. On parameters of increasing dimensions. Journal of Multivariate Analysis, 73(1):120–135, 2000.
  12. Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Transactions on Information Theory, 59(4):2082–2102, 2013.
  13. The implicit bias of gradient descent on nonseparable data. In Conference on Learning Theory, pages 1772–1798, 2019.
  14. Online bounds for Bayesian algorithms. In Advances in Neural Information Processing Systems 17, 2004.
  15. Felix Kuchelmeister and Sara van de Geer. Finite sample rates for logistic regression with small noise or few samples. arXiv preprint arXiv:2305.15991, 2023.
  16. Philip M Long. On the sample complexity of PAC learning half-spaces against the uniform distribution. IEEE Transactions on Neural Networks, 6(6):1556–1559, 1995.
  17. Philip M Long. An upper bound on the sample complexity of PAC-learning halfspaces with respect to the uniform distribution. Information Processing Letters, 87(5):229–234, 2003.
  18. An improper estimator with optimal excess risk in misspecified density estimation and logistic regression. Journal of Machine Learning Research, 23(31):1–49, 2022.
  19. Finite-sample analysis of M-estimators using self-concordance. Electronic Journal of Statistics, 15:326–391, 2021.
  20. Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. IEEE Transactions on Information Theory, 59(1):482–494, 2012.
  21. High-dimensional estimation with geometric constraints. Information and Inference: A Journal of the IMA, 6(1):1–40, 2017.
  22. Stephen Portnoy. Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. The Annals of Statistics, pages 356–366, 1988.
  23. Rocco A Servedio. On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. In Conference on Computational Learning Theory, pages 296–307, 1999.
  24. The implicit bias of gradient descent on separable data. Journal of Machine Learning Research, 19(1):2822–2878, 2018.
  25. Smoothness, low noise and fast rates. In Advances in Neural Information Processing Systems 23, 2010.
  26. Michel Talagrand. Sharper bounds for Gaussian and empirical processes. The Annals of Probability, pages 28–76, 1994.
  27. Matus Telgarsky. Margins, shrinkage, and boosting. In International Conference on Machine Learning, pages 307–315, 2013.
  28. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications, 16(2):264–280, 1971.
  29. Robert O Winder. Partitions of N𝑁Nitalic_N-space by hyperplanes. SIAM Journal on Applied Mathematics, 14(4):811–818, 1966.
  30. Tong Zhang. Information-theoretic upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory, 52(4):1307–1321, 2006.
Citations (2)

Summary

We haven't generated a summary for this paper yet.