Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Blessing of dimension in Bayesian inference on covariance matrices (2404.03805v1)

Published 4 Apr 2024 in stat.ME

Abstract: Bayesian factor analysis is routinely used for dimensionality reduction in modeling of high-dimensional covariance matrices. Factor analytic decompositions express the covariance as a sum of a low rank and diagonal matrix. In practice, Gibbs sampling algorithms are typically used for posterior computation, alternating between updating the latent factors, loadings, and residual variances. In this article, we exploit a blessing of dimensionality to develop a provably accurate pseudo-posterior for the covariance matrix that bypasses the need for Gibbs or other variants of Markov chain Monte Carlo sampling. Our proposed Factor Analysis with BLEssing of dimensionality (FABLE) approach relies on a first-stage singular value decomposition (SVD) to estimate the latent factors, and then defines a jointly conjugate prior for the loadings and residual variances. The accuracy of the resulting pseudo-posterior for the covariance improves with increasing dimensionality. We show that FABLE has excellent performance in high-dimensional covariance matrix estimation, including producing well calibrated credible intervals, both theoretically and through simulation experiments. We also demonstrate the strength of our approach in terms of accurate inference and computational efficiency by applying it to a gene expression data set.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE transactions on Automatic Control, 19(6):716–723.
  2. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422):669–679.
  3. Heterogeneous large datasets integration using Bayesian factor regression. Bayesian Analysis, 17(1):33–66.
  4. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Annals of Probability, 44:2479–2506.
  5. Sparse bayesian infinite factor models. Biometrika, 98(2):291–306.
  6. Regularized estimation of large covariance matrices. The Annals of Statistics, 36(1):199–227.
  7. Sparse estimation of a covariance matrix. Biometrika, 98(4):807–820.
  8. Decoupling shrinkage and selection in Gaussian linear factor analysis. Bayesian Analysis, 1(1):1–23.
  9. Inferring covariance structure from multiple data sources via subspace factor analysis. arXiv preprint arXiv:2305.04113.
  10. Determining the number of factors in high-dimensional generalized latent factor models. Biometrika, 109(3):769–782.
  11. Bayesian multistudy factor analysis for high-throughput biological data. The Annals of Applied Statistics, 15(4):1723–1741.
  12. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218.
  13. Are latent factor regression and sparse regression adequate? Journal of the American Statistical Association, pages 1–13.
  14. A comparison of Bayesian inference techniques for sparse factor analysis. arXiv preprint arXiv:2112.11719.
  15. Frühwirth-Schnatter, S. (2023). Generalized cumulative shrinkage process priors with applications to sparse Bayesian factor analysis. Philosophical Transactions of the Royal Society A, 381(2247):20220148.
  16. Genefilter: methods for filtering genes from high-throughput experiments. R package version, 1.84.0.
  17. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217–288.
  18. Fast variational inference for Bayesian factor analysis in single and multi-study settings. arXiv preprint arXiv:2305.13188.
  19. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, pages 1302–1338.
  20. Embracing the blessing of dimensionality in factor models. Journal of the American Statistical Association, 113(521):380–389.
  21. A Schatten-q low-rank matrix perturbation analysis via perturbation projection error bound. Linear Algebra and its Applications, 630:225–240.
  22. On posterior consistency of Bayesian factor models in high dimensions. Bayesian Analysis, 17(3):901–929.
  23. A mode-jumping algorithm for Bayesian factor analysis. Journal of the American Statistical Association, 117(537):277–290.
  24. Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases. Cell, 184(11):3006–3021.
  25. Posterior contraction in sparse Bayesian factor models for massive covariance matrices. The Annals of Statistics, 42(3):1102–1130.
  26. Bayesian inference for logistic models using pólya–gamma latent variables. Journal of the American Statistical Association, 108(504):1339–1349.
  27. Efficiently resolving rotational ambiguity in Bayesian matrix sampling with matching. arXiv preprint arXiv:2107.13783.
  28. R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  29. Estimation of simultaneously sparse and low rank matrices. arXiv preprint arXiv:1206.6474.
  30. Fast Bayesian factor analysis via automatic rotations to sparsity. Journal of the American Statistical Association, 111(516):1608–1622.
  31. Perturbed factor analysis: Accounting for group differences in exposure profiles. Annals of Applied Statistics, 15(3):1386–1404.
  32. A divide and conquer strategy for high dimensional Bayesian factor models. arXiv preprint arXiv:1612.02875.
  33. Gene regulatory circuits in innate and adaptive immune cells. Annual Review of Immunology, 40:387–411.
  34. Generalized infinite factorization models. Biometrika, 109(3):817–835.
  35. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, pages 461–464.
  36. Low-rank structured covariance matrix estimation. IEEE Signal Processing Letters, 26(5):700–704.
  37. Expandable factor analysis. Biometrika, 104(3):649–663.
  38. A hierarchical singular value decomposition algorithm for low rank matrices. arXiv preprint arXiv:1710.02812.
  39. Vershynin, R. (2010). Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027.
  40. Vershynin, R. (2011). Spectral norm of products of random and deterministic matrices. Probability Theory and Related Fields, 150(3-4):471–509.
  41. Empirical bayes matrix factorization. The Journal of Machine Learning Research, 22(1):5332–5371.
  42. Bayesian sparse spiked covariance model with a continuous matrix shrinkage prior. Bayesian Analysis, 17(4):1193–1217.
  43. An eigenvector-assisted estimation framework for signal-plus-noise matrix models. Biometrika.
  44. The cis-regulatory atlas of the mouse immune system. Cell, 176(4):897–912.
  45. On the non-asymptotic and sharp lower tail bounds of random variables. Stat, 9(1):e314.
  46. Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika, 101(1):103–120.
  47. Bayesian group factor analysis with structured sparsity. The Journal of Machine Learning Research.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com