Spike-and-Slab Priors in Structured Regression

Updated 15 August 2025

Spike-and-slab priors are hierarchical Bayesian mixtures that combine a concentrated spike with a diffuse slab to enforce sparsity and enable function selection in regression models.
The NMIG and parameter-expanded (peNMIG) constructions enhance MCMC efficiency by isolating scalar effects from high-dimensional coefficient blocks, thereby improving mixing.
These priors have been effectively applied in structured additive regression for survival, spatial, and mixed logit models, yielding superior model selection and predictive performance.

Spike-and-slab priors are a class of hierarchical Bayesian mixture priors that enforce sparsity and selective shrinkage in high-dimensional models by mixing a “spike” component—highly concentrated near zero—with a diffuse “slab” component, usually assigned over blocks or groups of parameters or coefficients. In the context of structured additive regression and function selection, spike-and-slab priors are used to determine which model terms or function components are actively contributing to the predictor and which should be excluded by shrinking their coefficients strongly toward zero.

1. Structured Additive Regression and the Need for Function Selection

The structured additive regression framework generalizes both Gaussian and non-Gaussian regression models by allowing the linear predictor to consist of a sum of potentially complex functional terms, such as nonlinear covariate effects, spatial surfaces, varying coefficients, and random effects: $\eta = \eta_0 + \sum_{j=1}^{p} X_j \beta_j$ where each term $X_j \beta_j$ encodes a potentially multivariate or nonparametric function of one or more covariates. This flexibility necessitates a mechanism for function selection: (1) inclusion/exclusion of covariates, (2) selection between linear/nonlinear or different parameterizations, (3) detection of interactions and complex effects.

A spike-and-slab prior targets the selection of relevant functions/model terms by imposing strong shrinkage (the spike) on irrelevant components and allowing flexible estimation (the slab) of truly active components.

2. Normal Mixture-of-Inverse Gamma (NMIG) Prior: Block Spike-and-Slab Construction

For function selection where model terms are represented as blocks of basis coefficients (e.g., splines or spatial bases), the NMIG prior is constructed as follows:

For block $\beta_j$ (of dimension $d_j$ ), the conditional prior is

$\beta_j \mid v_j^2 \sim N(0, v_j^2 I)$

The prior variance is decomposed hierarchically:

$v_j^2 = \gamma_j \tau_j^2$

where $\gamma_j$ is a binary indicator, and $\tau_j^2 \sim \mathrm{Inv}\text{-}\Gamma(a_\tau, b_\tau)$ . The indicator is distributed as

$\gamma_j \sim w\, \delta_1(\gamma_j) + (1-w)\, \delta_{v_0}(\gamma_j)$

with $v_0 \ll 1$ , so when $\gamma_j = 1$ the “slab” is large, and for $\gamma_j = v_0$ the “spike” imposes near-zero variance.

When $v_j^2 \approx 0$ , the corresponding function is excluded; when $v_j^2$ is large, it remains active.

3. Multiplicative Parameter Expansion (peNMIG): Improving MCMC Mixing

Standard Gibbs updates for coefficient blocks suffer from “stickiness” near the spike due to high posterior correlations. The paper introduces a novel multiplicative expansion:

$\beta_j = \alpha_j \xi_j$

where $\alpha_j \in \mathbb{R}$ (scalar), $\xi_j \in \mathbb{R}^{d_j}$ . The prior is specified as:

$\alpha_j \sim N(0, v_j^2)$ with $v_j^2 = \gamma_j \tau_j^2$
$\xi_{jk} | m_{jk} \sim N(m_{jk},1)$ , with $m_{jk} \sim \frac{1}{2}\delta_1(m_{jk}) + \frac{1}{2}\delta_{-1}(m_{jk})$

Advantages of this expansion:

Only the scalar $\alpha_j$ enters the block-wise spike-and-slab mechanism, eliminating poor MCMC mixing of high-dimensional blocks.
The marginal prior on $\beta_j$ has strong shrinkage at zero (spike) and heavy tails (slab), yielding $\ell_q$ –like penalties for $q<1$ , similar in behavior to horseshoe or normal–Jeffreys priors.
The prior hierarchy is:

$\begin{aligned} \tau_j^2 &\sim \Gamma^{-1}(a_\tau, b_\tau) \ \gamma_j &\sim w \delta_1(\gamma_j) + (1-w) \delta_{v_0}(\gamma_j) \ \alpha_j | \gamma_j, \tau_j^2 &\sim N(0, \gamma_j \tau_j^2) \ \xi_{jk} | m_{jk} &\sim N(m_{jk}, 1),\quad m_{jk} \sim \frac{1}{2}\delta_1(m_{jk}) + \frac{1}{2}\delta_{-1}(m_{jk}) \end{aligned}$

4. Posterior Computation: Efficient MCMC via Blocked and Hierarchical Structure

For Gaussian models, all conditional distributions have closed forms; thus, a Gibbs sampler can update all parameters efficiently:
- Blocked updates for $(\alpha_j, \xi_j)$ with $\alpha_j$ controlling inclusion.
- The indicator $\gamma_j$ and variance $\tau_j^2$ are only updated based on $\alpha_j$ , not the full block $\beta_j$ , thus enabling rapid transitions between inclusion and exclusion states.
- A rescaling step is used after updating $\xi_j$ to keep its entries close to unit norm, maintaining identifiability.
For non-Gaussian likelihoods, the conditional for $(\alpha_j, \xi_j)$ is updated via Metropolis–Hastings with proposals based on iteratively weighted least squares.

5. Performance Characteristics, Sensitivity, and Comparative Results

In simulation studies and classification benchmarks (e.g., UCI datasets), the peNMIG approach produces models with lower predictive deviance and higher sparsity compared to component-wise boosting, adaptive COSSO (ACOSSO), and frequentist penalization methods.
The method is relatively robust to hyperparameter choices, especially the prior inclusion probability $w$ , except for the threshold parameter $v_0$ , which strongly affects sparsity by controlling spike “sharpness.”
Recovery of relevant functional terms—true model structure—is highly accurate even with complex high-dimensional predictors and when selecting among functional representations (e.g., linear vs. nonlinear, varying coefficients, spatial effects).
Computation is feasible for hundreds of model terms due to the efficiency of the parameter-expanded sampler and is not limited by the curse of dimensionality in block sampling.

6. Applications: Survival, Spatial, and Mixed Model Scenarios

Additive piecewise exponential survival models: The hazard function is modeled as

$\lambda(t, x_i) = \exp\left(g_0(j) + \sum_\ell g_\ell(j) v_{il}(j) + \sum_m f_m(u_{im}(j)) + z_i(j)' \gamma\right),\qquad t \in (\kappa_{j-1}, \kappa_j]$

The spike-and-slab approach allows model selection among time-varying covariate functions, testing linearity/nonlinearity, and inclusion/exclusion of functional and parametric effects.

Geoadditive regression: For spatial analysis (e.g., Munich rent data), geoadditive smoothers modeled by Gaussian Markov random fields are included as blocks—spike-and-slab selection distinguishes spatial, smooth, and parametric components for interpretability.
Additive mixed logit models: For categorical/discrete responses with random effects, the procedure supports selection among random, smooth, and spatial effects while maintaining a parsimonious model.

7. Summary and Theoretical Implications

By combining blockwise spike-and-slab priors (with peNMIG construction) and a parameter expansion, efficient and robust function selection is feasible for structured additive regression with complex model spaces.
The multiplicative expansion addresses critical MCMC mixing and convergence issues and induces strong shrinkage and heavy-tailed tolerance for large signals, paralleling properties of other state-of-the-art shrinkage priors.
Spike-and-slab priors with this architecture provide a unified, computationally tractable framework for function/term selection in additive models with a broad class of effect types and interactions, and are validated by superior empirical model selection and predictive performance on benchmark data.

Summary Table: Key Model Elements in Parameter-Expanded NMIG Spike-and-Slab Prior

Component	Distribution	Role in Model Selection
$\gamma_j$	$w \delta_1 + (1-w)\delta_{v_0}$	Inclusion/exclusion of entire block
$\tau_j^2$	$\Gamma^{-1}(a_\tau, b_\tau)$	Controls block-wise variance magnitude
$\alpha_j$	$N(0, \gamma_j\tau_j^2)$	Overall scale, drives block activity
$\xi_{jk}$	$N(m_{jk}, 1)$ , $m_{jk} = \pm 1$	Unit-norm constraint, identifies structure

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Spike-and-Slab Priors.