Dirac Spike-and-Slab Prior
- The Dirac spike-and-slab prior is a Bayesian mixture prior that combines a Dirac delta spike at zero with a continuous slab, promoting exact sparsity in model parameters.
- Empirical Bayes techniques calibrate the mixing weight via marginal likelihood, with the choice of slab density (Laplace vs Cauchy) critically affecting posterior contraction rates.
- The full posterior factorizes across coefficients, ensuring model selection consistency and adaptive regularization, with hierarchical models offering improved uncertainty quantification.
The Dirac spike-and-slab prior is a foundational Bayesian mixture prior designed to induce exact sparsity in parameter estimation, especially for high-dimensional models involving variable or function selection. It takes the form of a product mixture where the "spike" is a Dirac delta mass at zero, ensuring that some parameters are identically zero with positive probability, while the "slab" is a diffuse continuous density—typically Gaussian, Laplace, or Cauchy—capturing the nonzero coefficients. The empirical Bayes approach calibrates the key mixing weight via marginal maximum likelihood, and selection of slab density is central for achieving optimal posterior contraction rates. The full posterior under this prior is a product of discrete-continuous mixtures across model coefficients, with each coordinate governed by its inclusion probability and corresponding slab.
1. Mathematical Formulation and Model Structure
The canonical sparse normal-means model observes , with for and imposes the prior constraint , i.e., at most nonzero entries in the mean vector. The Dirac spike-and-slab prior is specified as
where each is drawn independently from
Here, is the Dirac delta at $0$ (the spike), is a continuous slab density, and is the mixing proportion controlling expected sparsity. Under the Gaussian likelihood, the posterior remains in product form with closed expressions for inclusion probabilities and conditional slab densities: where
with the standard normal density and its convolution with .
2. Empirical Bayes Calibration
Empirical Bayes estimation uses marginal maximum likelihood to select the sparsity parameter . The marginal log-likelihood for is
A threshold-based lower bound (e.g., corresponding to ) restricts the maximization domain. The mixing parameter is then
Plug-in posteriors yield sparse estimates that adapt to the empirical complexity of the problem.
3. Convergence Rates and Slab Selection
Let denote the minimax -risk rate over . The choice of slab density critically affects concentration:
- Laplace slab () leads to suboptimal full posterior risk: far exceeding .
- Cauchy slab () delivers optimal concentration: A plausible implication is that heavy-tailed slabs enable minimax contraction of the full posterior; Laplace-type slabs may only suffice for posterior mean or median estimates.
4. Structure of the Full Posterior Distribution
The posterior induced by a Dirac spike-and-slab prior, under normal likelihood, factors across coordinates. Each coordinate exhibits an exact-zero event with positive probability due to the spike, and, conditional on inclusion, exhibits posterior shrinkage around the observed data modulated by the slab density. This mixture structure guarantees model selection consistency and facilitates interpretation: inactive coefficients are exactly zero, while actives are adaptively regularized by the slab.
For empirical Bayes, the posterior mean and median may achieve minimax rates under Laplace slab, but the credible set diameter and second moment can be substantially inflated, underscoring the necessity of considering full posterior properties rather than summary statistics alone.
5. Slab Density Choice: Comparative Analysis
| Slab Density | Slab Function | Posterior Rate |
|---|---|---|
| Laplace | Suboptimal: (fails for full posterior risk) | |
| Cauchy | Optimal: (minimax contraction for second moment) |
The observed phenomena suggest that heavy-tailed slabs are essential for minimax posterior contraction and robust credible set construction. The identity of the slab function dictates whether the Bayesian procedure achieves optimal uncertainty quantification and coverage.
6. Hierarchical versus Plug-in Bayes and Complexity Penalization
While empirical Bayes using marginal MLE calibration of can result in undersmoothing (especially for Laplace slabs) and suboptimal full posterior contraction, fully hierarchical Bayes—placing, e.g., a Beta prior on —recovers minimax posterior concentration even with a Laplace slab. This reflects the role of hierarchical complexity penalty in controlling over-inclusion and oversmoothing. In practical terms, empirical Bayes credible balls for Laplace slabs cover well but are too large; hierarchical approaches temper this inflation by integrating over sparsity levels.
7. Practical and Theoretical Implications
Empirical analysis and theoretical results confirm that the Dirac spike-and-slab prior creates exact zeros in the posterior, supports consistent model selection, and when paired with heavy-tailed slabs and hierarchical calibration, yields minimax contraction rates for the entire posterior. The resulting Bayesian credible sets exhibit sharp separation between included and excluded variables. However, practitioners should avoid naive plug-in approaches with Laplace slabs for uncertainty quantification tasks; hierarchical formulations or heavy-tailed slabs are required for fully optimal inference.
References:
- Castillo & Mismer, "Empirical Bayes analysis of spike and slab posterior distributions" (Castillo et al., 2018).