Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accurate Inference for Penalized Logistic Regression (2410.20045v1)

Published 26 Oct 2024 in stat.ME

Abstract: Inference for high-dimensional logistic regression models using penalized methods has been a challenging research problem. As an illustration, a major difficulty is the significant bias of the Lasso estimator, which limits its direct application in inference. Although various bias corrected Lasso estimators have been proposed, they often still exhibit substantial biases in finite samples, undermining their inference performance. These finite sample biases become particularly problematic in one-sided inference problems, such as one-sided hypothesis testing. This paper proposes a novel two-step procedure for accurate inference in high-dimensional logistic regression models. In the first step, we propose a Lasso-based variable selection method to select a suitable submodel of moderate size for subsequent inference. In the second step, we introduce a bias corrected estimator to fit the selected submodel. We demonstrate that the resulting estimator from this two-step procedure has a small bias order and enables accurate inference. Numerical studies and an analysis of alcohol consumption data are included, where our proposed method is compared to alternative approaches. Our results indicate that the proposed method exhibits significantly smaller biases than alternative methods in finite samples, thereby leading to improved inference performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6):716–723.
  2. Family, educational and peer influences on the alcohol use of female and male adolescents. Journal of Studies on Alcohol, 56(4):457–469.
  3. Alcohol culture, family structure and adolescent alcohol use: multilevel modeling of frequency of heavy drinking among 15-16 year old students in 11 european countries. Journal of Studies on Alcohol, 64(2):200–208.
  4. A review of the logistic regression model with emphasis on medical research. Journal of Data Analysis and Information Processing, 7(04):190.
  5. Statistical inference for high-dimensional generalized linear models with binary outcomes. Journal of the American Statistical Association, 118(542):1319–1332.
  6. The dantzig selector: Statistical estimation when p𝑝pitalic_p is much larger than n𝑛nitalic_n. The Annals of Statistics, 35(6):2313–2351.
  7. Using data mining to predict secondary school student performance. In Proceedings of 5th Annual Future Business Technology Conference, pages 5–12.
  8. Theoretical Statistics. CRC Press.
  9. A general definition of residuals. Journal of the Royal Statistical Society Series B: Statistical Methodology, 30(2):248–265.
  10. High-dimensional inference: confidence intervals, p-values and r-software hdi. Statistical Science, pages 533–558.
  11. Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(1):37–65.
  12. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456):1348–1360.
  13. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(5):849–911.
  14. Gender differences in criminal sentencing: Do effects vary across violent, property, and drug offenses? Social Science Quarterly, 87(2):318–339.
  15. Indirect inference. Journal of Applied Econometrics, 8(1):85–118.
  16. Calibration by simulation for small sample bias correction. Simulation-based Inference in Econometrics: Methods and Applications, page 328.
  17. Simulation-based bias correction methods for complex models. Journal of the American Statistical Association, 114(525):146–157.
  18. Inference for the case probability in high-dimensional logistic regression. Journal of Machine Learning Research, 22(254):1–54.
  19. Race and imprisonment decisions. In The American Court System, pages 311–330. Routledge.
  20. Kuk, A. Y. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. Journal of the Royal Statistical Society Series B: Statistical Methodology, 57(2):395–407.
  21. Global and simultaneous hypothesis testing for high-dimensional logistic regression models. Journal of the American Statistical Association, 116(534):984–998.
  22. Approximate bias correction in econometrics. Journal of Econometrics, 85(2):205–230.
  23. A general theory of hypothesis tests and confidence regions for sparse high dimensional models. The Annals of Statistics, 45(1):158–195.
  24. SIHR: Statistical inference in high-dimensional linear and logistic regression models. arXiv preprint arXiv:2109.03365.
  25. Gender differences in factors influencing alcohol use and drinking progression among adolescents. Clinical Psychology Review, 29(6):535–547.
  26. The interaction of race, gender, and age in criminal sentencing: The punishment cost of being young, black, and male. Criminology, 36(4):763–798.
  27. A modern maximum-likelihood theory for high-dimensional logistic regression. Proceedings of the National Academy of Sciences, 116(29):14516–14525.
  28. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288.
  29. On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics, 42(3):1166–1202.
  30. Substance use, abuse and dependence in adolescence: prevalence, symptom profiles and correlates. Drug and Alcohol Dependence, 68(3):309–322.
  31. Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(1):217–242.
  32. Regularization parameter selections via generalized information criterion. Journal of the American Statistical Association, 105(489):312–323.
  33. In defense of the indefensible: A very naive approach to high-dimensional inference. Statistical Science, 36(4):562–577.
  34. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418–1429.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com