Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm (2209.08139v5)
Abstract: Bayesian variable selection methods are powerful techniques for fitting and inferring on sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. In this paper, we proposed a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are required through the use of plug-in empirical Bayes estimates of hyperparameters. Efficient maximum a posteriori (MAP) estimation is completed through a Parameter-Expanded Expectation-Conditional-Maximization (PX-ECM) algorithm. The PX-ECM results in a robust computationally efficient coordinate-wise optimization which -- when updating the coefficient for a particular predictor -- adjusts for the impact of other predictor variables. The completion of the E-step uses an approach motivated by the popular two-group approach to multiple testing. The result is a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm applied to sparse high-dimensional linear regression, which can be completed using one-at-a-time or all-at-once type optimization. We compare the empirical properties of PROBE to comparable approaches with numerous simulation studies and analyses of cancer cell drug responses. The proposed approach is implemented in the R package probe.
- The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607.
- Bartlett, M. S. (1957). A comment on d. v. lindley’s statistical paradox. Biometrika 44(3/4), 533–534.
- Empirical bayes oracle uncertainty quantification for regression. The Annals of Statistics 48(6), 3113–3137.
- Adaptive fdr control under independence and dependence. J. Mach. Learn. Res. 10, 2837 –2831.
- Variational inference: A review for statisticians. Journal of the American statistical Association 112(518), 859–877.
- Bondell, H. D. and B. J. Reich (2012). Consistent high-dimensional bayesian variable selection via penalized credible regions. Journal of the American Statistical Association 107(500), 1610–1624.
- Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics 5(1), 232–253.
- Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies. Bayesian Analysis 7(1), 73–108. Publisher: International Society for Bayesian Analysis.
- The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480.
- On spike and slab empirical Bayes multiple testing. The Annals of Statistics 48(5), 2548–2574.
- Bayesian linear regression with sparse priors. The Annals of Statistics 43(5), 1986–2018.
- Bayesian sparse linear regression with unknown symmetric error. Information and Inference: A Journal of the IMA 8(3), 621–653.
- Convergence of a stochastic approximation version of the em algorithm. Annals of statistics 27(1), 94–128.
- The joint lasso: high-dimensional regression for group structured data. Biostatistics 21(2), 219–235.
- Rcpparmadillo: Accelerating r with high-performance c++ linear algebra. Computational Statistics and Data Analysis 71, 1054–1063.
- Efron, B. (2008). Microarrays, empirical bayes and the two-group model. Statistical Science 23(1), 1–22.
- Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96(456), 1151–1160.
- Modelling extremal events: for insurance and finance, Volume 33. Springer Science & Business Media.
- Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association 96(456), 1348–1360.
- Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33(1), 1–22.
- Bayesian data analysis (Third ed.). Texts in Statistical Science Series. CRC Press, Boca Raton, FL.
- Calibration and empirical Bayes variable selection. Biometrika 87(4), 731–747.
- George, E. I. and R. E. McCulloch (1993). Variable selection via gibbs sampling. Journal of the American Statistical Association 88(423), 881–889.
- Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/CRC.
- Hoerl, A. E. and R. W. Kennard (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67.
- Jamshidian, M. and R. I. Jennrich (2000). Standard errors for em estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 62(2), 257–270.
- Adaptive bayesian slope: Model selection with incomplete data. Journal of Computational and Graphical Statistics 31(1), 113–137.
- Jin, J. and T. T. Cai (2007). Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102(478), 495–506.
- A flexible empirical bayes approach to multiple linear regression and connections with penalized regression.
- Posterior probabilities of alternative linear models. Rotterdam: Rotterdam University Press.
- Leventhal, D. and A. S. Lewis (2010). Randomized methods for linear constraints: convergence rates and conditioning. Mathematics of Operations Research 35(3), 641–654.
- Mixtures of g priors for bayesian variable selection. Journal of the American Statistical Association 103(481), 410–423.
- Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85(4), 755–770.
- Louis, T. A. (1982). Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 44(2), 226–233.
- Empirical bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23(3), 1822–1847.
- Empirical priors for prediction in sparse high-dimensional linear regression. J. Mach. Learn. Res. 21, 1–30.
- Mascarenhas, W. F. (1995). On the convergence of the jacobi method for arbitrary orderings. SIAM journal on matrix analysis and applications 16(4), 1197–1209.
- Fitting high-dimensional linear regression models with probe. https://github.com/alexmclain/PROBE.
- Meng, X.-L. and D. B. Rubin (1991). Using em to obtain asymptotic variance-covariance matrices: The sem algorithm. Journal of the American Statistical Association 86(416), 899–909.
- Meng, X.-L. and D. B. Rubin (1992). Recent extensions to the EM algorithm (with discussion). In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith (Eds.), Bayesian statistics 4, pp. 307–320. Oxford University Press.
- Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80(2), 267–278.
- Expectation-propagation for the generative aspect model. In Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence, pp. 352–359.
- Minka, T. P. (2001). A family of algorithms for approximate Bayesian inference. Ph. D. thesis, Massachusetts Institute of Technology.
- Mitchell, T. J. and J. J. Beauchamp (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association 83(404), 1023–1032.
- Oakes, D. (1999). Direct calculation of the information matrix via the em. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61(2), 479–482.
- Ostrovnaya, I. and D. L. Nicolae (2012). Estimating the proportion of true null hypotheses under dependence. Statistica Sinica 22(4), 1689–1716.
- An iterative procedure for obtaining maximum-likelihood estimates of the parameters for a mixture of normal distributions. SIAM Journal on Applied Mathematics 35(2), 362–378.
- Bayes and empirical Bayes: do they merge? Biometrika 101(2), 285–302.
- The bayesian bridge. Journal of the Royal Statistical Society: Series B: Statistical Methodology 76(4), 713–733.
- Qian, N. (1999). On the momentum term in gradient descent learning algorithms. Neural networks 12(1), 145–151.
- R Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
- Variational bayes for high-dimensional linear regression with sparse priors. Journal of the American Statistical Association 117(539), 1270–1281.
- Ročková, V. and E. I. George (2018). The spike-and-slab lasso. Journal of the American Statistical Association 113(521), 431–444.
- Exact calculations for false discovery proportion with application to least favorable configurations. Ann. Statist. 39(1), 584–612.
- Particle EM for Variable Selection. Journal of the American Statistical Association 113(524), 1684–1697.
- Ročková, V. and E. I. George (2014). Emvs: The em approach to bayesian variable selection. Journal of the American Statistical Association 109(506), 828–846.
- Scott, J. G. and J. O. Berger (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics 38(5), 2587 – 2619.
- Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London.
- Stephens, M. (2017). False discovery rates: a new deal. Biostatistics 18(2), 275–294.
- Storey, J. D. (2007). The optimal discovery procedure: a new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69(3), 347–368.
- Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B. Stat. Methodol. 66(1), 187–205.
- Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102(479), 901–912.
- On the importance of initialization and momentum in deep learning. In International conference on machine learning, pp. 1139–1147. PMLR.
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58(1), 267–288.
- Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model. Bayesian Analysis 1(3), 625 – 650.
- Maximum likelihood estimation via the ECM algorithm: computing the asymptotic variance. Statist. Sinica 5(1), 55–75.
- Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scandinavian Journal of Statistics 35(2), 335–353.
- Expectation Propagation as a Way of Life: A Framework for Bayesian Inference on Partitioned Data. Journal of Machine Learning Research 21(17), 1–53.
- A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society Series B: Statistical Methodology 82(5), 1273–1300.
- Zhang, C.-H. et al. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics 38(2), 894–942.
- Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476), 1418–1429.