A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression (2208.10910v3)
Abstract: We introduce a new empirical Bayes approach for large-scale multiple linear regression. Our approach combines two key ideas: (i) the use of flexible "adaptive shrinkage" priors, which approximate the nonparametric family of scale mixture of normal distributions by a finite mixture of normal distributions; and (ii) the use of variational approximations to efficiently estimate prior hyperparameters and compute approximate posteriors. Combining these two ideas results in fast and flexible methods, with computational speed comparable to fast penalized regression methods such as the Lasso, and with competitive prediction accuracy across a wide range of scenarios. Further, we provide new results that establish conceptual connections between our empirical Bayes methods and penalized methods. Specifically, we show that the posterior mean from our method solves a penalized regression problem, with the form of the penalty function being learned from the data by directly solving an optimization problem (rather than being tuned by cross-validation). Our methods are implemented in an R package, mr.ash.alpha, available from https://github.com/stephenslab/mr.ash.alpha.
- The Trimmed Lasso: sparse recovery guarantees and practical optimization by the generalized soft-min penalty. SIAM Journal on Mathematics of Data Science, 3(3):900–929, 2021.
- N. S. P. Anirban Bhattacharya, Debdeep Pati and D. B. Dunson. Dirichlet–Laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110(512):1479–1490, 2015.
- Spike-and-slab meets LASSO: A review of the spike-and-slab LASSO. In M. G. Tadesse and M. Vannucci, editors, Handbook of Bayesian Variable Selection, pages 81–108. Chapman and Hall/CRC, Boca Raton, FL, 2021.
- A. Beck and L. Tetruashvili. On the convergence of block coordinate descent type methods. SIAM Journal on Optimization, 23(4):2037–2060, 2013.
- J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer, New York, NY, 1985.
- D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, second edition, 1999.
- The Trimmed Lasso: sparsity and robustness. arXiv, 1708.04527, 2017.
- Lasso meets horseshoe: a survey. Statistical Science, 34(3):405–427, 2019.
- C. Bishop. Pattern Recognition and Machine Learning. Springer, New York, NY, 2006.
- Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
- Variational inference: a review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- P. Breheny and J. Huang. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Annals of Applied Statistics, 5(1):232–253, 2011.
- P. Carbonetto and M. Stephens. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, 7(1):73–108, 2012.
- varbvs: fast variable selection for large-scale regression. arXiv, 1709.06597, 2017.
- Empirical Bayes: past, present and future. Journal of the American Statistical Association, 95(452):1286–1289, 2000.
- The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480, 2010.
- G. Casella. Empirical bayes gibbs sampling. Biostatistics, 2(4):485–500, 2001.
- Needles and straw in a haystack: posterior concentration for possibly sparse sequences. Annals of Statistics, 40(4):2069–2101, 2012.
- The practical implementation of Bayesian model selection. In Model Selection, volume 38 of IMS Lecture Notes, pages 65–116. 2001.
- Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–22, 1977.
- S. Dharmadhikari and K. Joag-Dev. Unimodality, convexity, and applications. Academic Press, Boston, MA, 1988.
- J. Drugowitsch. Variational Bayesian inference for linear and logistic regression. arXiv, 1310.5438, 2013.
- B. Efron. Microarrays, empirical Bayes and the two-groups model. Statistical Science, 32(1):1–22, 2008.
- B. Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
- B. Efron. Bayes, oracle Bayes and empirical Bayes. Statistical Science, 34(2):177–201, 2019.
- B. Efron and C. Morris. Stein’s estimation rule and its competitors—an empirical Bayes approach. Journal of the American Statistical Association, 68(341):117–130, 1973.
- J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456):1348–1360, 2011.
- M. A. T. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1150–1159, 2003.
- Pathwise coordinate optimization. Annals of Applied Statistics, 1(2):302–332, 2007.
- Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010.
- W. J. Fu. Penalized regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7(3):397–416, 1998.
- A gene-based association method for mapping traits using reference transcriptome data. Nature Genetics, 47(9):1091–1098, 2015.
- Bayesian data analysis. CRC Press, Boca Raton, FL, third edition, 2013.
- Calibration and empirical Bayes variable selection. Biometrika, 87(4):731–747, 2000.
- Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423):881–889, 1993.
- Approaches for Bayesian variable selection. Statistica Sinica, 9(2):339–373, 1997.
- D. Gerard and M. Stephens. Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation. Biostatistics, 21(1):15–32, 2020.
- Z. Ghahramani and G. E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000.
- M. Girolami. A variational method for learning sparse and overcomplete representations. Neural Computation, 13(11):2517–2532, 2001.
- Inference with normal-gamma prior distributions in regression problems. Bayesian Analysis, 5(1):171–188, 2010.
- GTEx Consortium. Genetic effects on gene expression across human tissues. Nature, 550(7675):204–213, 2017.
- Y. Guan and M. Stephens. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Annals of Applied Statistics, 5(3):1780–1815, 2011.
- Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics, 12:186, 2011.
- C. Hans. Bayesian lasso regression. Biometrika, 96(4):835–845, 2009.
- Maximum-likelihood estimation for the mixed analysis of variance model. Biometrika, 54(1–2):93–108, 1967.
- The Elements of Statistical Learning. Springer, New York, NY, second edition, 2009.
- H. Hazimeh and R. Mazumder. Fast best subset selection: coordinate descent and local combinatorial optimization algorithms. Operations Research, 68(5):1517–1537, 2020.
- Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
- Bayesian parameter estimation via variational methods. Statistics and Computing, 10:25–37, 2000.
- A. Javanmard and A. Montanari. Debiasing the lasso: optimal sample size for Gaussian designs. Annals of Statistics, 46(6A):2593–2622, 2018.
- Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Annals of Statistics, 32(4):1594–1649, 2004.
- Empirical Bayes selection of wavelet thresholds. Annals of Statistics, pages 1700–1752, 2005.
- L. Joo. Bayesian lasso: An extension for genome-wide association study. PhD thesis, New York University, 2017.
- An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999.
- A fast algorithm for maximum likelihood estimation of mixture proportions using sequential quadratic programming. Journal of Computational and Graphical Statistics, forthcoming, 29(2):261–273, 2020.
- R. Koenker and I. Mizera. Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. Journal of the American Statistical Association, 109(506):674–685, 2014.
- S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):79–86, 1951.
- Q. Li and N. Lin. The Bayesian elastic net. Bayesian Analysis, 5(1):151 – 170, 2010. doi: 10.1214/10-BA506. URL https://doi.org/10.1214/10-BA506.
- Mixtures of g𝑔gitalic_g priors for Bayesian variable selection. Journal of the American Statistical Association, 103(481):410–423, 2008.
- A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinformatics, 11:58, 2010.
- M. Lu and M. Stephens. Variance adaptive shrinkage (vash): flexible empirical Bayes estimation of variances. Bioinformatics, 32(22):3428–3434, 2016.
- Z. Q. Luo and P. Tseng. On the convergence of the coordinate descent method for convex differentiable minimization. Journal of Optimization Theory and Applications, 72(1):7–35, 1992.
- Piecewise bounds for estimating Bernoulli-logistic latent Gaussian models. In Proceedings of the 28th International Conference on Machine Learning, pages 633–640, 2011.
- Sparsenet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 106(495):1125–1138, 2011.
- P. McCullagh and J. A. Nelder. Generalized Linear Models, volume 37 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York, NY, second edition, 1989.
- Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4):1819–1829, 2001.
- A fast algorithm for BayesB type of prediction of genome-wide estimates of genetic value. Genetics Selection Evolution, 41:2, 2009.
- A. J. Miller. Subset Selection in Regression. Chapman and Hall/CRC, Boca Raton, FL, 2002.
- Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032, 1988.
- C. N. Morris. Parametric empirical Bayes inference: theory and applications. Journal of the American Statistical Association, 78(381):47–55, 1983.
- Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genetics, 11(4):e1004969, 2015.
- F. Nebebe and T. Stroud. Bayes and empirical Bayes shrinkage estimation of regression coefficients. Canadian Journal of Statistics, 14(4):267–280, 1986.
- N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1(3):127–239, 2014.
- T. Park and G. Casella. The Bayesian Lasso. Journal of the American Statistical Association, 103(482):681–686, 2008.
- Genome-wide regression and prediction with the BGLR statistical package. Genetics, 198(2):483–495, 2014.
- P. Pérez and G. de Los Campos. Genome-wide regression and prediction with the BGLR statistical package. Genetics, 198(2):483–495, 2014.
- R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2019. URL https://www.r-project.org.
- K. Ray and B. Szabó. Variational Bayes for high-dimensional linear regression with sparse priors. Journal of the American Statistical Association, page forthcoming, 2021.
- Variational Bayesian methods for spatial data analysis. Computational Statistics and Data Analysis, 55(12):3197–3217, 2011.
- H. Robbins. The empirical Bayes approach to statistical decision problems. Annals of Mathematical Statistics, 35(1):1–20, 1964.
- V. Ročková and E. I. George. The Spike-and-Slab LASSO. Journal of the American Statistical Association, 113(521):431–444, 2018.
- Exploiting tractable substructures in intractable networks. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 486–492. MIT Press, 1996.
- G. K. Smyth. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), 2004.
- M. Stephens. False discovery rates: a new deal. Biostatistics, 18(2):275–294, 2016.
- S. M. Stigler. Studies in the history of probability and statistics XL: Boscovich, Simpson and a 1760 manuscript note on fitting a linear relation. Biometrika, 71(3):615–620, 1984.
- False discoveries occur early on the lasso path. Annals of Statistics, 45(5):2133–2150, 2017.
- L. Sun and M. Stephens. Solving the empirical Bayes normal means problem with correlated noise. arXiv, 1812.07488, 2018.
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, pages 267–288, 1996.
- A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics—Doklady, 4:501–504, 1963.
- P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, 2001.
- Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics, 51(1):187–195, 2019.
- Learning from a lot: empirical Bayes for high-dimensional model-based prediction. Scandinavian Journal of Statistics, 46(1):2–25, 2019.
- Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008.
- B. Wang and D. M. Titterington. Inadequacy of interval estimates corresponding to variational bayesian approximations. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, pages 373–380, 2005.
- C. Wang and D. M. Blei. Variational inference in nonconjugate models. Journal of Machine Learning Research, 14(Apr):1005–1031, 2013.
- A simple new approach to variable selection in regression, with application to genetic fine mapping. Journal of the Royal Statistical Society, Series B, 82(5):1273–1300, 2020.
- W. Wang and M. Stephens. Empirical Bayes matrix factorization. Journal of Machine Learning Research, 22(120):1–40, 2021.
- S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 151:3–34, 2015.
- T. T. Wu and K. Lange. Coordinate descent algorithms for lasso penalized regression. Annals of Applied Statistics, 2(1):224–244, 2008.
- Flexible signal denoising via flexible empirical Bayes shrinkage. Journal of Machine Learning Research, 22(93):1–28, 2021.
- On variational Bayes estimation and variational information criteria for linear regression models. Australian and New Zealand Journal of Statistics, 56(1):73–87, 2014.
- M. Yuan and Y. Lin. Efficient empirical Bayes variable selection and estimation in linear models. Journal of the American Statistical Association, 100(472):1215–1225, 2005.
- Trimming the ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT regularizer: statistical analysis, optimization, and applications to deep learning. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, pages 7242–7251, 2019.
- Fast and accurate Bayesian polygenic risk modeling with variational inference. American Journal of Human Genetics, 110(5):741–761, 2023.
- Signatures of negative selection in the genetic architecture of human complex traits. Nature Genetics, 50(5):746–753, 2018.
- C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2):894–942, 2010.
- Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genetics, 9(2), 2013.
- Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics, 35(12):2084–2092, 2019.
- H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67(2):301–320, 2005.