Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dirac Spike-and-Slab Prior

Updated 17 January 2026
  • The Dirac spike-and-slab prior is a Bayesian mixture prior that combines a Dirac delta spike at zero with a continuous slab, promoting exact sparsity in model parameters.
  • Empirical Bayes techniques calibrate the mixing weight via marginal likelihood, with the choice of slab density (Laplace vs Cauchy) critically affecting posterior contraction rates.
  • The full posterior factorizes across coefficients, ensuring model selection consistency and adaptive regularization, with hierarchical models offering improved uncertainty quantification.

The Dirac spike-and-slab prior is a foundational Bayesian mixture prior designed to induce exact sparsity in parameter estimation, especially for high-dimensional models involving variable or function selection. It takes the form of a product mixture where the "spike" is a Dirac delta mass at zero, ensuring that some parameters are identically zero with positive probability, while the "slab" is a diffuse continuous density—typically Gaussian, Laplace, or Cauchy—capturing the nonzero coefficients. The empirical Bayes approach calibrates the key mixing weight via marginal maximum likelihood, and selection of slab density is central for achieving optimal posterior contraction rates. The full posterior under this prior is a product of discrete-continuous mixtures across model coefficients, with each coordinate governed by its inclusion probability and corresponding slab.

1. Mathematical Formulation and Model Structure

The canonical sparse normal-means model observes Xi=θi+ϵiX_i = \theta_i + \epsilon_i, with ϵiN(0,1)\epsilon_i \sim N(0,1) for i=1,,ni = 1, \dots, n and imposes the prior constraint θ00[sn]\theta_0 \in \ell_0[s_n], i.e., at most sns_n nonzero entries in the mean vector. The Dirac spike-and-slab prior is specified as

Πα=i=1n[(1α)δ0+αG],\Pi_\alpha = \bigotimes_{i=1}^n \left[ (1-\alpha)\, \delta_0 + \alpha\, G \right],

where each θi\theta_i is drawn independently from

π(θiα)=(1α)δ0(θi)+αg(θi).\pi(\theta_i | \alpha) = (1-\alpha)\, \delta_0(\theta_i) + \alpha\, g(\theta_i).

Here, δ0\delta_0 is the Dirac delta at $0$ (the spike), gg is a continuous slab density, and α[0,1]\alpha \in [0,1] is the mixing proportion controlling expected sparsity. Under the Gaussian likelihood, the posterior remains in product form with closed expressions for inclusion probabilities and conditional slab densities: Πα(dθX)=i=1n[(1aα(Xi))δ0+aα(Xi)GXi],\Pi_\alpha(d\theta|X) = \bigotimes_{i=1}^n \left[ (1 - a_\alpha(X_i))\,\delta_0 + a_\alpha(X_i)\, G_{X_i} \right], where

aα(x)=αgX(x)(1α)ϕ(x)+αgX(x),a_\alpha(x) = \frac{\alpha\, g_X(x)}{(1-\alpha)\phi(x) + \alpha\, g_X(x)},

with ϕ(x)\phi(x) the standard normal density and gX(x)g_X(x) its convolution with gg.

2. Empirical Bayes Calibration

Empirical Bayes estimation uses marginal maximum likelihood to select the sparsity parameter α\alpha. The marginal log-likelihood for XX is

n(α;X)=i=1nlog((1α)ϕ(Xi)+αgX(Xi)).\ell_n(\alpha; X) = \sum_{i=1}^n \log\left( (1-\alpha)\phi(X_i) + \alpha\, g_X(X_i) \right).

A threshold-based lower bound αn\alpha_n (e.g., corresponding to t(αn)=2lognt(\alpha_n) = \sqrt{2\log n}) restricts the maximization domain. The mixing parameter is then

α^=argmaxα[αn,1]n(α;X).\hat{\alpha} = \arg\max_{\alpha \in [\alpha_n, 1]} \ell_n(\alpha; X).

Plug-in posteriors Πα^\Pi_{\hat{\alpha}} yield sparse estimates that adapt to the empirical complexity of the problem.

3. Convergence Rates and Slab Selection

Let rn=2snlog(n/sn)r_n = 2 s_n \log(n/s_n) denote the minimax 2\ell_2-risk rate over 0[sn]\ell_0[s_n]. The choice of slab density critically affects concentration:

  • Laplace slab (g(θ)=12eθg(\theta) = \frac{1}{2} e^{-|\theta|}) leads to suboptimal full posterior risk: Eθ0θθ02dΠα^(θX)snexp(log(n/sn)),E_{\theta_0}\int \|\theta - \theta_0\|^2\, d\Pi_{\hat\alpha}(\theta | X) \gtrsim s_n \exp( \sqrt{\log(n/s_n)} ), far exceeding rnr_n.
  • Cauchy slab (g(θ)=1π(1+θ2)1g(\theta) = \frac{1}{\pi}(1+\theta^2)^{-1}) delivers optimal concentration: supθ00[sn]Eθ0θθ02dΠα^(θX)Crn.\sup_{\theta_0 \in \ell_0[s_n]} E_{\theta_0} \int \|\theta - \theta_0\|^2\, d\Pi_{\hat\alpha}(\theta|X) \le C r_n. A plausible implication is that heavy-tailed slabs enable minimax contraction of the full posterior; Laplace-type slabs may only suffice for posterior mean or median estimates.

4. Structure of the Full Posterior Distribution

The posterior induced by a Dirac spike-and-slab prior, under normal likelihood, factors across coordinates. Each coordinate exhibits an exact-zero event with positive probability due to the spike, and, conditional on inclusion, exhibits posterior shrinkage around the observed data modulated by the slab density. This mixture structure guarantees model selection consistency and facilitates interpretation: inactive coefficients are exactly zero, while actives are adaptively regularized by the slab.

For empirical Bayes, the posterior mean and median may achieve minimax rates under Laplace slab, but the credible set diameter and second moment can be substantially inflated, underscoring the necessity of considering full posterior properties rather than summary statistics alone.

5. Slab Density Choice: Comparative Analysis

Slab Density Slab Function Posterior Rate
Laplace g(θ)=12eθg(\theta) = \frac{1}{2} e^{-|\theta|} Suboptimal: snexp(log(n/sn))s_n \exp(\sqrt{\log(n/s_n)}) (fails for full posterior risk)
Cauchy g(θ)=1π(1+θ2)1g(\theta) = \frac{1}{\pi}(1+\theta^2)^{-1} Optimal: CrnC r_n (minimax contraction for second moment)

The observed phenomena suggest that heavy-tailed slabs are essential for minimax posterior contraction and robust credible set construction. The identity of the slab function dictates whether the Bayesian procedure achieves optimal uncertainty quantification and coverage.

6. Hierarchical versus Plug-in Bayes and Complexity Penalization

While empirical Bayes using marginal MLE calibration of α\alpha can result in undersmoothing (especially for Laplace slabs) and suboptimal full posterior contraction, fully hierarchical Bayes—placing, e.g., a Beta prior on α\alpha—recovers minimax posterior concentration even with a Laplace slab. This reflects the role of hierarchical complexity penalty in controlling over-inclusion and oversmoothing. In practical terms, empirical Bayes credible balls for Laplace slabs cover well but are too large; hierarchical approaches temper this inflation by integrating over sparsity levels.

7. Practical and Theoretical Implications

Empirical analysis and theoretical results confirm that the Dirac spike-and-slab prior creates exact zeros in the posterior, supports consistent model selection, and when paired with heavy-tailed slabs and hierarchical calibration, yields minimax contraction rates for the entire posterior. The resulting Bayesian credible sets exhibit sharp separation between included and excluded variables. However, practitioners should avoid naive plug-in approaches with Laplace slabs for uncertainty quantification tasks; hierarchical formulations or heavy-tailed slabs are required for fully optimal inference.

References:

  • Castillo & Mismer, "Empirical Bayes analysis of spike and slab posterior distributions" (Castillo et al., 2018).
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dirac Spike-and-Slab Prior.