Bayesian Spike-and-Slab Framework

Updated 1 February 2026

Bayesian Spike-and-Slab Framework is a hierarchical model that combines a sharp 'spike' at zero with a diffuse 'slab' to represent coefficients for sparse estimation.
It enables effective variable selection and uncertainty quantification in high-dimensional regression by inducing a multimodal posterior over sparse supports.
The framework leverages restricted isometry properties and efficient rejection sampling techniques to ensure accurate, scalable posterior sampling with formal computational guarantees.

A Bayesian Spike-and-Slab Framework is a hierarchical probabilistic modeling approach for sparse estimation, variable selection, and uncertainty quantification in high-dimensional inference problems. The framework combines a "spike" component (typically a point mass at zero or a sharply peaked continuous density) with a "slab" component (a diffuse or heavy-tailed density) as a prior distribution over model coefficients, supports, or structural parameters. This yields a multimodal posterior that encodes combinatorial uncertainty over sparsity patterns and continuous uncertainty over effect sizes, with formal guarantees and efficient computational algorithms now available for exact posterior sampling in regimes previously inaccessible to traditional methods (Kumar et al., 4 Mar 2025).

1. Model Specification and Prior Construction

Bayesian spike-and-slab regression is defined via observations $y = X\,\theta + \epsilon$ , with $\epsilon \sim N(0,\sigma^2 I_n)$ , $X \in \mathbb{R}^{n \times d}$ , and an unknown sparse $\theta \in \mathbb{R}^d$ . The canonical spike-and-slab prior is

$\pi(\theta) = \bigotimes_{i=1}^d \Big[(1-\alpha_i)\,\delta_0 + \alpha_i\,\mu\Big]$

where each coordinate is zero ("spike") with probability $1-\alpha_i$ or drawn i.i.d. from the slab density $\mu$ (e.g., $N(0,1)$ or Laplace) with probability $\alpha_i$ (Kumar et al., 4 Mar 2025). This induces a prior over supports $S = \mathrm{supp}(\theta)$ : $P(S) = \prod_{i \in S}\alpha_i \prod_{i \notin S}(1-\alpha_i), \qquad \theta_S \mid S \sim \mu^{\otimes |S|},\; \theta_{S^c} = 0$ The prior is thus both discrete (over possible supports) and continuous (over effect sizes in the slab).

2. Posterior Characterization and Analytic Structure

Under Gaussian noise and a Gaussian slab, the posterior takes the form: $\pi(\theta|X,y) \propto \exp\left(-\frac{\|y-X\theta\|^2}{2\sigma^2}\right) \pi(\theta)$ In the Gaussian-slab case, one obtains an explicit mixture representation: $\pi(\theta|y) = \sum_{S \subset [d]} w(S)\,\mathcal{N}(\mu_S, \Sigma_S) \cdot 1_{\mathrm{supp}(\theta) \subseteq S}$ with explicit formulas for the mixture means, covariances, and weights (Kumar et al., 4 Mar 2025): $\Sigma_S = \left(X_S^\top X_S/\sigma^2 + I_S\right)^{-1}, \qquad \mu_S = \Sigma_S \left(X_S^\top y/\sigma^2\right)$

$w(S) \propto \prod_{i \in S}\frac{\alpha_i}{1-\alpha_i} \cdot \det(\Sigma_S)^{-1/2} \cdot \exp\left(\frac{1}{2}\mu_S^\top \Sigma_S \mu_S\right)$

For Laplace slab, closed-form Gaussian integrals are lost, but the same underlying mixture structure persists, subject to numerical integration (Kumar et al., 4 Mar 2025).

3. Sampling-Complexity, Restricted Isometry and Statistical Regimes

A key insight is that posterior sampling, in order to be accurate and tractable in high dimensions, demands that $X$ satisfy a restricted isometry property (RIP) up to sparsity $k^\star = O(k + \log(1/\delta))$ , where $k$ is the expected sparsity and $\delta$ is total variation error tolerance (Kumar et al., 4 Mar 2025). For Gaussian $X \sim N(0,1/n)$ ,

Polynomial-time, high-accuracy sampler: $n \geq C k^3 \cdot \mathrm{polylog}(d)$ achieves TV error $\leq \delta$ in $O(n^2 d^{1.5} \mathrm{polylog}(d/( \delta \min\{1,\sigma\})))$ operations.
Near-linear-time sampler: $n \geq C k^5 \cdot \mathrm{polylog}(d)$ achieves the same accuracy in $\widetilde{O}(nd \log(1/\min\{1, \sigma\} ))$ (Kumar et al., 4 Mar 2025).

RIP suffices for standard random matrix ensembles (sub-Gaussian, subsampled Fourier, etc.), requiring only a sublinear scaling of samples in the dimension $d$ . These bounds break past barriers of strictly linear sample regimes or strong SNR assumptions common in previous literature.

4. Algorithmic Advances: Hint-Vector Estimation and Product Proposals

The sampling framework proceeds in two stages:

Hint-vector estimation: Fast sparse-recovery ( $\ell_\infty$ or $\ell_2$ -based) yields an initial estimate $\hat{\theta}$ with support $T \approx \mathrm{supp}(\theta^\star)$ . With high posterior probability, $\hat{\theta} \subseteq S$ for sample support $S$ and $\|\hat{\theta} - \theta\|$ small (Proposition 2.8).
Coordinatewise product-proposal + rejection sampling: Using "recentering" (Lemma 3.3), weights over supports $S \supset T$ become simple coordinatewise products. A conditional Poisson product distribution $\mu$ over $S$ can be sampled in $O(d k^\star)$ time and is provably close (within constant ratios, Lemma 3.7) to the true posterior mass, allowing TV-accurate rejection sampling to the true posterior support $S$ (Kumar et al., 4 Mar 2025).

Once support $S$ is drawn, sampling $\theta_S \sim N(\mu_S, \Sigma_S)$ is exact (Lemma 3.11).

5. Provable Posterior Guarantees, Sparsity, and Estimation-to-Sampling Results

The main theorem (Kumar et al., 4 Mar 2025): Under stipulated RIP and sample size,

$\|\,\mathrm{law}(\tilde{\theta}) - \pi(\cdot\,|\,X, y)\,\|_{TV} \leq \delta$

The computational cost is as above, and rejection sampling mixes efficiently over the posterior support (mixing cost $O(\mathrm{poly}(C)\log(1/\delta))$ ). The near-linear-time sampler requires $k^5$ samples and achieves comparable accuracy.

Structural lemmas include:

Support-sparsity (Corollary 2.10): For any product prior, $\pi[\mathrm{supp}(\theta) \leq 6k + O(\log(1/\delta))] \geq 1-\delta$ .
Posterior-to-estimation (Proposition 2.8): Any estimator $\hat{\theta}$ that is good in metric $m(\cdot,\cdot)$ with high probability induces a sampling procedure that, with probability $\geq 1-2\delta$ , draws $\pi$ -samples within double the estimation error in $m$ .

6. Extension to Laplace Slabs

For slab $\mu(x) \propto \exp(-|x|)$ , the Gaussian mixture structure lacks closed-form integrals. The algorithm adapts by:

Using Monte Carlo or annealing-based normalizer estimation for each mixture component (Prop 4.1, Cor 4.6) to accuracy $(1 \pm \Delta)$ in $O(k^4/\Delta^2 \cdot \mathrm{polylog}(k R/\Delta))$ .
Restricting to $\sigma = O(1/k)$ to control Laplace tail errors (Lemma 4.2).

Theorem 1.3 (Cor 4.14): For $\mu = \mathrm{Lap}(0,1)$ , $\sigma = O(1/(k+\log(1/\delta)))$ , $n \geq C k^3\, \mathrm{polylog}(d/\delta)$ , sample in $O(n^2 d^{1.5}\mathrm{polylog}(d/(\sigma \delta)) + k^4/\delta^2 \mathrm{polylog}(d/(\sigma \delta)))$ time with TV $\leq \delta$ . Near-linear-time algorithms analogous to the Gaussian case hold for $n \geq k^5\, \mathrm{polylog}(d)$ (Kumar et al., 4 Mar 2025).

7. Technical and Applied Impact

This framework supplies the first polynomial-time, sublinear-measurement, provably exact samplers for spike-and-slab posteriors in high-dimensional sparse linear regression, valid for all SNRs, with flexible extension to Laplace diffuse priors (Kumar et al., 4 Mar 2025). The approach unifies continuous algorithms, precise RIP-based concentration, and rejection-sampling with conditional-Poisson tractable supports for rigorous total variation-accuracy and running-time guarantees.

The framework establishes spike-and-slab posterior sampling—with certifiable accuracy and scalable computation—as the theoretical and practical gold standard for Bayesian sparse regression in high dimensions.

Markdown Report Issue Upgrade to Chat

References (1)

Spike-and-Slab Posterior Sampling in High Dimensions (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Spike-and-Slab Framework.