Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution (2405.00424v2)
Abstract: Ridge regression is an indispensable tool in big data analysis. Yet its inherent bias poses a significant and longstanding challenge, compromising both statistical efficiency and scalability across various applications. To tackle this critical issue, we introduce an iterative strategy to correct bias effectively when the dimension $p$ is less than the sample size $n$. For $p>n$, our method optimally mitigates the bias such that any remaining bias in the proposed de-biased estimator is unattainable through linear transformations of the response data. To address the remaining bias when $p>n$, we employ a Ridge-Screening (RS) method, producing a reduced model suitable for bias correction. Crucially, under certain conditions, the true model is nested within our selected one, highlighting RS as a novel variable selection approach. Through rigorous analysis, we establish the asymptotic properties and valid inferences of our de-biased ridge estimators for both $p<n$ and $p>n$, where, both $p$ and $n$ may increase towards infinity, along with the number of iterations. We further validate these results using simulated and real-world data examples. Our method offers a transformative solution to the bias challenge in ridge regression inferences across various disciplines.
- Abadie, A., and M. Kasy (2019): “Choosing among regularized estimators in empirical economics: The risk of machine learning,” Review of Economics and Statistics, 101(5), 743–762.
- Athey, S., and G. W. Imbens (2019): “Machine learning methods that economists should know about,” Annual Review of Economics, 11, 685–725.
- Bai, J., and S. Ng (2006): “Confidence intervals for diffusion index forecasts and inference with factor-augmented regressions,” Econometrica, 74(4), 1133–1150.
- Bai, Z., and Y. Yin (1993): “Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance,” Annals of Probability, 21(3), 1275–1294.
- Bühlmann, P. (2013): “Statistical significance in high-dimensional linear models,” Bernoulli, 19(4), 1212–1242.
- Cheng, X., and B. E. Hansen (2015): “Forecasting with factor-augmented regression: A frequentist model averaging approach,” Journal of Econometrics, 186(2), 280–293.
- Dobriban, E., and S. Wager (2018): “High-dimensional asymptotics of prediction: Ridge regression and classification,” The Annals of Statistics, 46(1), 247–279.
- Fan, J., and J. Lv (2008): “Sure independence screening for ultrahigh dimensional feature space,” Journal of the Royal Statistical Society Series B: Statistical Methodology (with discussion), 70(5), 849–911.
- Gao, Z., and R. S. Tsay (2022): “Modeling high-dimensional time series: A factor model with dynamically dependent factors and diverging eigenvalues,” Journal of the American Statistical Association, 117(539), 1398–1414.
- (2023): “Supervised Dynamic PCA: Linear Dynamic Forecasting with Many Predictors,” arXiv preprint arXiv:2307.07689.
- Giannone, D., M. Lenza, and G. E. Primiceri (2021): “Economic predictions with big data: The illusion of sparsity,” Econometrica, 89(5), 2409–2437.
- Hansen, B. E. (2022a): Econometrics. Princeton University Press.
- (2022b): “A modern Gauss–Markov theorem,” Econometrica, 90(3), 1283–1294.
- Hastie, T. (2020): “Ridge regularization: An essential concept in data science,” Technometrics, 62(4), 426–433.
- Hoerl, A. E. (1959): “Optimum solution of many variables equations,” Chemical Engineering Progress, 55(11), 69–78.
- Hoerl, A. E., and R. W. Kennard (1970a): “Ridge regression: applications to nonorthogonal problems,” Technometrics, pp. 69–82.
- (1970b): “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, 12, 55–67.
- Hoerl, R. W. (1985): “Ridge analysis 25 years later,” The American Statistician, 39(3), 186–192.
- Leonard, M. K., L. Gwilliams, K. K. Sellers, J. E. Chung, D. Xu, G. Mischler, N. Mesgarani, M. Welkenhuysen, B. Dutta, and E. F. Chang (2023): “Large-scale single-neuron speech sound encoding across the depth of human cortex,” Nature, pp. 1–10.
- McCracken, M. W., and S. Ng (2016): “FRED-MD: A monthly database for macroeconomic research,” Journal of Business & Economic Statistics, 34(4), 574–589.
- Otwinowski, J., and J. B. Plotkin (2014): “Inferring fitness landscapes by regression produces biased estimates of epistasis,” Proceedings of the National Academy of Sciences, 111(22), E2301–E2309.
- Shao, J., and X. Deng (2012): “Estimation in high-dimensional linear models with deterministic design matrices,” The Annals of Statistics, 40(2), 812–831.
- Stock, J. H., and M. Watson (2002a): “Macroeconomic Forecasting Using Diffusion Indexes,” Journal of Business & Economic Statistics, 20, 147–162.
- Stock, J. H., and M. W. Watson (2002b): “Forecasting Using Principal Components From a Large Number of Predictors,” Journal of the American Statistical Association, 97(460), 1167–1179.
- Theobald, C. M. (1974): “Generalizations of mean square error applied to ridge regression,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 36(1), 103–106.
- Tikhonov, A. N. (1943): “On the stability of inverse problems,” Doklady Akademii Nauk SSSR, 39(5), 195–198.
- van Wieringen, W. N. (2023): “Lecture notes on ridge regression,” arXiv preprint arXiv:1509.09169, v8.
- Vapnik, V. N. (2013): The nature of statistical learning theory. Springer science & business media.
- Wang, X., and C. Leng (2016): “High dimensional ordinary least squares projection for screening variables,” Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(3), 589–611.
- Zahrt, A. F., J. J. Henle, B. T. Rose, Y. Wang, W. T. Darrow, and S. E. Denmark (2019): “Prediction of higher-selectivity catalysts by computer-driven workflow and machine learning,” Science, 363(6424), eaau5631.
- Zhang, Y., and D. N. Politis (2022): “Ridge regression revisited: Debiasing, thresholding and bootstrap,” The Annals of Statistics, 50(3), 1401–1422.