Papers
Topics
Authors
Recent
Search
2000 character limit reached

Posterior contraction in sparse Bayesian factor models for massive covariance matrices

Published 16 Jun 2012 in math.ST | (1206.3627v4)

Abstract: Sparse Bayesian factor models are routinely implemented for parsimonious dependence modeling and dimensionality reduction in high-dimensional applications. We provide theoretical understanding of such Bayesian procedures in terms of posterior convergence rates in inferring high-dimensional covariance matrices where the dimension can be larger than the sample size. Under relevant sparsity assumptions on the true covariance matrix, we show that commonly-used point mass mixture priors on the factor loadings lead to consistent estimation in the operator norm even when $p\gg n$. One of our major contributions is to develop a new class of continuous shrinkage priors and provide insights into their concentration around sparse vectors. Using such priors for the factor loadings, we obtain similar rate of convergence as obtained with point mass mixture priors. To obtain the convergence rates, we construct test functions to separate points in the space of high-dimensional covariance matrices using insights from random matrix theory; the tools developed may be of independent interest. We also derive minimax rates and show that the Bayesian posterior rates of convergence coincide with the minimax rates upto a $\sqrt{\log n}$ term.

Summary

  • The paper derives precise posterior contraction rates for estimating high-dimensional covariance matrices using sparse Bayesian factor models, achieving near minimax optimality.
  • It compares point mass mixture and continuous shrinkage priors, revealing that continuous shrinkage provides practical and robust performance in ultra-high-dimensional settings.
  • The study utilizes novel test constructions based on random matrix theory and simulation experiments to validate its theoretical guarantees and computational effectiveness.

Posterior Contraction in Sparse Bayesian Factor Models for Massive Covariance Matrices

Introduction and Problem Setting

This work addresses the theoretical properties of posterior contraction in sparse Bayesian factor models for estimating high-dimensional covariance matrices, specifically focusing on situations where the ambient dimension pp significantly exceeds sample size nn (pnp \gg n). The focus is on latent factor models for covariance estimation under sparsity: each observed vector yiRpy_i \in \mathbb{R}^p is modeled as

yi=Ληi+εi,εiNp(0,Ω),y_i = \Lambda \eta_i + \varepsilon_i, \quad \varepsilon_i \sim N_p(0, \Omega),

where Λ\Lambda is a p×kp \times k factor loading matrix (with kpk \ll p), ηiNk(0,Ik)\eta_i \sim N_k(0,I_k), and Ω\Omega is diagonal. Marginally, the covariance takes the reduced form Σ=ΛΛT+Ω\Sigma = \Lambda \Lambda^T + \Omega, drastically reducing the number of free parameters from O(p2)O(p^2) to O(pk)O(pk). The key practical and theoretical challenge is to obtain non-trivial estimation guarantees when pp grows much faster than nn.

The analysis emphasizes "ultra"-high-dimensional settings and provides precise posterior contraction rates under sparsity assumptions and prior constructions that are relevant for genomic, neuroscience, and other modern high-dimensional datasets.

Assumptions and Prior Structures

The theoretical development rests on several assumptions:

  • Factor Structure on Truth: The true covariance has the form Σ0n=Λ0nΛ0nT+Ω0n\Sigma_{0n} = \Lambda_{0n} \Lambda_{0n}^T + \Omega_{0n}, with growing pnp_n and possibly growing k0nk_{0n}.
  • Column Sparsity: Each column of Λ0n\Lambda_{0n} has at most sns_n nonzero components (snpns_n \ll p_n), reflecting factor sparsity.
  • Pervasiveness and Conditioning: Spectral conditions on Λ0n\Lambda_{0n}, echoing "pervasive" factors in random matrix theory, with mild restrictions on the growth of the largest eigenvalue cnc_n and minimal conditions on the residual variance.
  • Sample Size and Model Complexity: Conditions such that cnk0n3/2(snlogpn/n)logn0c_n k_{0n}^{3/2} \sqrt{(s_n \log p_n/n) \sqrt{\log n}} \to 0, which allow pnp_n to be of order exp(nα)\exp(n^\alpha) for some α(0,1/5)\alpha \in (0,1/5) under typical sparsity regimes.

Two types of priors for loadings are analyzed:

  1. Point Mass Mixture Priors (Spike-and-Slab): Each entry is zero with high probability, otherwise drawn from a heavy-tailed distribution. This matches frequentist penalization with exact sparsity but is computationally challenging.
  2. Continuous Shrinkage Priors: Hierarchical scale mixtures of Laplace (double exponential) distributions with global-local scales (motivated by the Horseshoe, Dirichlet-Laplace, etc.). These enable efficient MCMC without exact zeros.

Priors on kk favor small numbers of factors, and on σ2\sigma^2 (residual variance) are diffuse but proper.

Main Theoretical Results

Posterior Contraction Rates

A central contribution is the derivation of explicit posterior contraction rates for estimating the covariance Σ0n\Sigma_{0n} in the operator (spectral) norm, under both point mass mixture and continuous shrinkage priors. The results can be summarized as follows:

  • Operator Norm Consistency: If snlogpns_n \gtrsim \log p_n and k0n=O(1)k_{0n} = O(1), the posterior contracts at rate

εn=cnsnlogpnnlogn\varepsilon_n = c_n \sqrt{\frac{s_n \log p_n}{n}} \sqrt{\log n}

in operator norm, i.e.,

Πn(ΣnΣ0n2>Mεny(n))0\Pi_n \left( \| \Sigma_n - \Sigma_{0n} \|_2 > M \varepsilon_n \mid y^{(n)} \right) \to 0

in probability, for any sufficiently large MM. The dependence on cnc_n encodes the "energy" in the largest eigenvalue.

  • General k0nk_{0n} Growth: If k0nk_{0n} is allowed to grow, the contraction rate becomes

cnk0n3/2snlogpnnlogn.c_n k_{0n}^{3/2} \sqrt{\frac{s_n \log p_n}{n}} \sqrt{\log n}.

Thus, sparsity and factor proliferation both contribute to statistical complexity.

  • Matching Minimax Lower Bounds: For fixed k0nk_{0n}, the posterior rates coincide with a new minimax lower bound (proved via Fano's method) up to an explicit logn\sqrt{\log n} factor:

infΣ^nsupΣ0nEΣ^nΣ0n2cnsnlogpn/n.\inf_{\hat{\Sigma}_n} \sup_{\Sigma_{0n}} \mathbb{E}\| \hat{\Sigma}_n - \Sigma_{0n} \|_2 \ge c_n \sqrt{ s_n \log p_n / n }.

This demonstrates the theoretical efficiency of the Bayesian procedures under the specified conditions.

  • Robustness to Prior Specification: The optimal contraction rates are obtained for both point mass mixture and the proposed shrinkage priors, provided the priors allocate sufficient mass to neighborhoods of the true sparse loading vectors.

Priors and High-Dimensional Properties

The paper develops properties of the analyzed continuous shrinkage priors that guarantee:

  • Prior Concentration: Lower bounds for the prior probability of small balls around arbitrary ss-sparse vectors comparable to those for point mass mixture priors.
  • Effective Dimensionality Control: Exponential decay of the probability that the number of "large" (> δ\delta) entries in the loading is much bigger than order sns_n.
  • Tail Control: Subexponential deviation bounds on the 1\ell_1 norm of the priors, ensuring concentration on "reasonable" model sizes.

These results are nontrivial as traditional global-local shrinkage mechanisms do not guarantee the necessary localized prior mass in ultra-high dimensions.

Test Construction and Proof Techniques

The authors advance the theory of posterior contraction in non-Hellinger metrics by constructing nonparametric tests for covariance matrices using techniques from random matrix concentration inequalities. The construction leverages the fact that, under the factor model, the dominant contribution to the operator norm comes from the low-rank term, enabling projection-based tests with exponentially decaying type I and II errors in high dimensions.

Proofs build on detailed metric entropy calculations, sharp bounds on prior masses, and control of test errors, encompassing random matrix theory and empirical process theory tools.

Numerical Experiments

The simulation studies compare the proposed Bayesian methods with frequentist techniques such as POET and adaptive thresholding, evaluating covariance estimation accuracy in operator norm under varying p,n,k0n,snp, n, k_{0n}, s_n, for both well-specified and misspecified noise structures.

The continuous shrinkage prior outperforms or matches other methods in all settings, especially as model complexity increases, and is robust to the absence of exact sparsity or diagonal noise. Point mass mixture priors deteriorate in very large models, primarily due to computational mixing issues. The empirical results thus reinforce the theoretical findings.

Implications and Future Directions

The theoretical advances provide a framework for principled Bayesian inference for covariance estimation under realistic high-dimensional settings with latent structure and sparsity. The results justify the use of continuous shrinkage priors as computational surrogates for spike-and-slab approaches, with guarantees matching minimax frequentist rates up to log-factors.

Several avenues for further research are apparent:

  • Extension to Approximate Factor Models: Relaxing structural assumptions to allow for non-diagonal idiosyncratic variance and weakly sparse low-rank structure.
  • Adaptive or Empirical Bayes Procedures: Prior hyperparameter tuning for optimal adaptation to unknown sparsity and eigenvalue growth.
  • Posterior Convergence for Functional Parameters: Extension to linear or quadratic functionals of high-dimensional covariance (e.g., prediction under factor models).
  • Sharper Minimax Adaptivity: Investigation of potential improvements to eliminate the logn\sqrt{\log n} factor and extensions to more general sparsity regimes.

Conclusion

This work provides a rigorous analysis of posterior contraction properties for a broad class of Bayesian factor models under sparsity, for covariance estimation in "ultra"-high-dimensional settings. By establishing precise rates, matching minimax lower bounds, and demonstrating practical computational robustness, it strongly supports the use of Bayesian shrinkage methods, both theoretically and empirically, in modern high-dimensional statistical inference (1206.3627).

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.