Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the existence of the maximum likelihood estimate and convergence rate under gradient descent for multi-class logistic regression (2012.04576v5)

Published 8 Dec 2020 in cs.LG, math.ST, and stat.TH

Abstract: We revisit the problem of the existence of the maximum likelihood estimate for multi-class logistic regression. We show that one method of ensuring its existence is by assigning positive probability to every class in the sample dataset. The notion of data separability is not needed, which is in contrast to the classical set up of multi-class logistic regression in which each data sample belongs to one class. We also provide a general and constructive estimate of the convergence rate to the maximum likelihood estimate when gradient descent is used as the optimizer. Our estimate involves bounding the condition number of the Hessian of the maximum likelihood function. The approaches used in this article rely on a simple operator-theoretic framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. A. Albert and J. A. Anderson. On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71(1):1–10, 1984. doi: 10.1093/biomet/71.1.1.
  2. R. Bhatia. Linear algebra to quantum cohomology: The Story of Alfred Horn’s Inequalities. The American Mathematical Monthly, 108(4):289–318, 2001.
  3. E. J. Candès and P. Sur. The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. The Annals of Statistics, 48(1), feb 2020. doi: 10.1214/18-AOS1789.
  4. Computing Extended Maximum Likelihood Estimates for Linear Parameter Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 53(2):417–426, 1991.
  5. Condition number analysis of logistic regression, and its implications for standard first-order solution methods, 2018.
  6. Z. Ji and M. Telgarsky. The implicit bias of gradient descent on nonseparable data. In A. Beygelzimer and D. Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 1772–1798. PMLR, 25–28 Jun 2019.
  7. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
  8. K. Konis. Linear programming algorithms for detecting separated data in binary logistic regression models. PhD thesis, University of Oxford, 2007.
  9. Convergence of gradient descent on separable data. In K. Chaudhuri and M. Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 3420–3428. PMLR, 16–18 Apr 2019a.
  10. Stochastic gradient descent on separable data: Exact convergence with a fixed learning rate. In K. Chaudhuri and M. Sugiyama, editors, Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, volume 89 of Proceedings of Machine Learning Research, pages 3051–3059. PMLR, 16–18 Apr 2019b.
  11. M. Rychlik. A proof of convergence of multi-class logistic regression network. Arixiv:1903.12600, 2019.
  12. A note on a. albert and j. a. anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika, 73(3):755–758, 1986.
  13. M. J. Silvapulle. On the Existence of Maximum Likelihood Estimators for the Binomial Response Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 43(3):310–313, jul 1981. doi: 10.1111/j.2517-6161.1981.tb01676.x.
  14. M. J. Silvapulle and J. Burridge. Existence of Maximum Likelihood Estimates in Regression Models for Grouped and Ungrouped Data. Journal of the Royal Statistical Society Series B: Statistical Methodology, 48(1):100–106, 12 2018. doi: 10.1111/j.2517-6161.1986.tb01394.x.
  15. The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square. Probability Theory and Related Fields, 175(1-2):487–558, oct 2019. doi: 10.1007/s00440-018-00896-9.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets