Papers
Topics
Authors
Recent
Search
2000 character limit reached

BB-SSL: Bayesian Bootstrap Spike-and-Slab LASSO

Updated 23 June 2026
  • The paper demonstrates that BB-SSL combines Bayesian bootstrap techniques with spike-and-slab LASSO priors to enable scalable approximate posterior uncertainty quantification.
  • It employs randomized MAP optimization with jittered priors and fast coordinate-descent to achieve theoretical contraction rates comparable to exact Bayesian methods.
  • The methodology offers substantial computational efficiency, excels in parallel scalability, and outperforms traditional MCMC in high-dimensional sparse regression tasks.

The Bayesian Bootstrap Spike-and-Slab LASSO (BB-SSL) is an inferential methodology that combines Bayesian bootstrap techniques and spike-and-slab LASSO priors to enable scalable approximate posterior uncertainty quantification in high-dimensional sparse regression problems. By leveraging fast coordinate-descent optimization and random perturbations—both of the data and the prior—BB-SSL yields approximate posterior draws that achieve theoretical posterior contraction rates comparable to exact Bayesian inference while offering substantial computational benefits over traditional Markov chain Monte Carlo (MCMC) approaches (Nie et al., 2020).

1. Prior Construction: Spike-and-Slab LASSO and Jittered Priors

BB-SSL is fundamentally built upon the spike-and-slab LASSO (SSL) prior for linear regression. The model takes the form Y=Xβ+ϵY = X\beta + \epsilon, ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I), β∈Rp\beta \in \mathbb{R}^p. The SSL prior for each βj\beta_j specifies a two-component Laplace mixture with mixing probability θ\theta: π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|}, with λ0≫λ1>0\lambda_0 \gg \lambda_1 > 0 and a Beta(a,b)(a, b) prior on θ\theta.

BB-SSL introduces further flexibility through "jittered" priors, in which each coefficient is shrunken not towards zero, but towards a random location μj\mu_j, sampled iid from the spike component: ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)0 The resulting prior is

ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)1

where ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)2 and ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)3. This construction attenuates the tendency of standard weighted Bayesian bootstrap (WBB) approaches to collapse small effects exactly to zero (Nie et al., 2020).

2. Approximate Posterior Sampling via Reweighted MAP Optimization

BB-SSL employs randomized maximum a posteriori (MAP) optimization to generate approximate posterior draws. Each iteration consists of sampling data weights ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)4 from a Dirichlet distribution and random jitter ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)5 from the Laplace spike, then solving a penalized weighted regression problem:

  1. Sample ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)6 (total mass ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)7).
  2. Sample ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)8 for each ϵ∼N(0,σ2I)\epsilon \sim \mathcal{N}(0, \sigma^2 I)9.
  3. Form the pseudo-likelihood:

β∈Rp\beta \in \mathbb{R}^p0

and the jittered prior β∈Rp\beta \in \mathbb{R}^p1.

  1. Compute the MAP estimate by maximizing

β∈Rp\beta \in \mathbb{R}^p2

This reduces to the SSL coordinate-descent algorithm applied to reweighted (β∈Rp\beta \in \mathbb{R}^p3, β∈Rp\beta \in \mathbb{R}^p4) data, followed by shifting the solution by β∈Rp\beta \in \mathbb{R}^p5.

Each replicate is independent, enabling straightforward parallelization. Optionally, the mixing weight β∈Rp\beta \in \mathbb{R}^p6 can be updated via its Beta full conditional.

3. Induced Pseudo-Posterior and Theoretical Contraction Rates

The distribution of BB-SSL draws can be characterized as the pushforward of the joint law β∈Rp\beta \in \mathbb{R}^p7 through the weighted MAP operator: β∈Rp\beta \in \mathbb{R}^p8 Under regularity conditions, these draws approximate the actual posterior β∈Rp\beta \in \mathbb{R}^p9.

For sparse normal means βj\beta_j0, βj\beta_j1, and weights βj\beta_j2 satisfying appropriate moment and tail conditions, BB-SSL achieves the minimax contraction rate βj\beta_j3 for the posterior mean squared error (Nie et al., 2020). Analogous results hold in high-dimensional regression (βj\beta_j4), with contraction rate βj\beta_j5, under restricted eigenvalue and sparsity assumptions.

For multivariate regression, Bayesian bootstrap variants applied to the multivariate SSL (mSSL) yield contraction for the Frobenius and prediction errors at rates

βj\beta_j6

with βj\beta_j7, βj\beta_j8 denoting the sparsities of the coefficient and precision matrices, respectively (Shen et al., 2022).

4. Computational Complexity and Scalability

The main computational cost for BB-SSL is the βj\beta_j9 replicates of coordinate-descent MAP optimization, each costing θ\theta0. Since each replicate is independent, BB-SSL is "embarrassingly parallel": the total cost scales as θ\theta1. For comparison, standard Gibbs samplers for SSL or MCMC-based approaches incur θ\theta2 or θ\theta3 cost per iteration, and may be serially correlated. Fast Gibbs routines for the horseshoe prior can achieve θ\theta4 or θ\theta5 per iteration, but still lack the parallelism and fail to match the computational efficiency of BB-SSL in the full regime (Nie et al., 2020).

5. Empirical Performance in Simulation and Real Data

BB-SSL closely matches the gold-standard stochastic search variable selection (SSVS) on posterior density estimation and marginal inclusion probabilities in both low- and high-dimensional scenarios. In low-dimensional regression (e.g., θ\theta6, θ\theta7, correlated blocks), BB-SSL tracks SSVS even for multimodal posteriors, in contrast to weighted Bayesian bootstrap (WBB) methods, which assign zero to many coefficients, and Skinny Gibbs, which underestimates posterior variance. On model selection, BB-SSL typically recovers θ\theta899% of the posterior mass, exceeding WBB and Skinny Gibbs.

In high-dimensional settings (θ\theta9, π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},0), BB-SSL maintains robust posterior density estimates and credible intervals. Metrics such as Kullback-Leibler divergence, Jaccard distance, bias in posterior means, and Hamming distance favor BB-SSL or place it on par with Skinny Gibbs, with BB-SSL consistently outperforming WBB. In terms of effective sample size per wall-clock time, BB-SSL dominates, followed by Skinny Gibbs, fast MCMC, and WBB.

Real data analyses demonstrate that BB-SSL produces independent posterior samples at rates orders of magnitude faster than MCMC-based SSVS, with nearly identical marginal posterior densities and inclusion probabilities. For instance, in the Life-Cycle Savings dataset (π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},1, π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},2), BB-SSL achieves an effective sample size of π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},3 vs. SSVS's π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},4 (Nie et al., 2020).

6. Practical Guidelines, Limitations, and Extensions

Recommended settings for BB-SSL include π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},5–π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},6 perturbations for stable credible intervals, with a Dirichlet concentration parameter π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},7 satisfying the theoretical lower bound (π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},8). A practical default for π(βj∣θ)=θλ12e−λ1∣βj∣+(1−θ)λ02e−λ0∣βj∣,\pi(\beta_j \mid \theta) = \theta \frac{\lambda_1}{2} e^{-\lambda_1 |\beta_j|} + (1 - \theta) \frac{\lambda_0}{2} e^{-\lambda_0 |\beta_j|},9 is λ0≫λ1>0\lambda_0 \gg \lambda_1 > 00, which is calibrated to the noise level. The regularization parameter λ0≫λ1>0\lambda_0 \gg \lambda_1 > 01 should be selected to promote sparsity, with precomputed regularization paths reusable across all replicates.

BB-SSL assumes known noise variance λ0≫λ1>0\lambda_0 \gg \lambda_1 > 02, which must be specified or estimated via empirical Bayes. While BB-SSL provides posterior contraction rates and competitive empirical uncertainty quantification, it does not deliver exact frequentist coverage. Extensions to generalized linear models (GLMs) involve customized optimization but retain the same weighted-MAP framework. Open questions include the accuracy of high-dimensional posterior approximation (Bernstein–von Mises refinements) and integration with generative bootstrap schemes for further efficiency gains (Nie et al., 2020).

The Bayesian bootstrap overlay for the multivariate SSL (mSSL) operates via randomized MAP solvers with Gamma-distributed weights and optional random recentering, yielding interval estimates from empirical quantiles of resulting replicates. Simulation studies show that these Bayesian bootstrap intervals are substantially shorter yet achieve frequentist coverage close to nominal values when compared to asymptotic de-biasing intervals, which, though valid, are often 3–10 times longer (Shen et al., 2022). The empirical efficiency and scalability of BB-SSL and its multivariate versions suggest a strong practical advantage for high-dimensional sparse inference with uncertainty quantification in contemporary statistical workflows.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Bootstrap Spike-and-Slab LASSO.