Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 191 tok/s Pro
2000 character limit reached

Spike-and-Slab Priors in Structured Regression

Updated 15 August 2025
  • Spike-and-slab priors are hierarchical Bayesian mixtures that combine a concentrated spike with a diffuse slab to enforce sparsity and enable function selection in regression models.
  • The NMIG and parameter-expanded (peNMIG) constructions enhance MCMC efficiency by isolating scalar effects from high-dimensional coefficient blocks, thereby improving mixing.
  • These priors have been effectively applied in structured additive regression for survival, spatial, and mixed logit models, yielding superior model selection and predictive performance.

Spike-and-slab priors are a class of hierarchical Bayesian mixture priors that enforce sparsity and selective shrinkage in high-dimensional models by mixing a “spike” component—highly concentrated near zero—with a diffuse “slab” component, usually assigned over blocks or groups of parameters or coefficients. In the context of structured additive regression and function selection, spike-and-slab priors are used to determine which model terms or function components are actively contributing to the predictor and which should be excluded by shrinking their coefficients strongly toward zero.

1. Structured Additive Regression and the Need for Function Selection

The structured additive regression framework generalizes both Gaussian and non-Gaussian regression models by allowing the linear predictor to consist of a sum of potentially complex functional terms, such as nonlinear covariate effects, spatial surfaces, varying coefficients, and random effects: η=η0+j=1pXjβj\eta = \eta_0 + \sum_{j=1}^{p} X_j \beta_j where each term XjβjX_j \beta_j encodes a potentially multivariate or nonparametric function of one or more covariates. This flexibility necessitates a mechanism for function selection: (1) inclusion/exclusion of covariates, (2) selection between linear/nonlinear or different parameterizations, (3) detection of interactions and complex effects.

A spike-and-slab prior targets the selection of relevant functions/model terms by imposing strong shrinkage (the spike) on irrelevant components and allowing flexible estimation (the slab) of truly active components.

2. Normal Mixture-of-Inverse Gamma (NMIG) Prior: Block Spike-and-Slab Construction

For function selection where model terms are represented as blocks of basis coefficients (e.g., splines or spatial bases), the NMIG prior is constructed as follows:

  • For block βj\beta_j (of dimension djd_j), the conditional prior is

βjvj2N(0,vj2I)\beta_j \mid v_j^2 \sim N(0, v_j^2 I)

  • The prior variance is decomposed hierarchically:

vj2=γjτj2v_j^2 = \gamma_j \tau_j^2

where γj\gamma_j is a binary indicator, and τj2Inv-Γ(aτ,bτ)\tau_j^2 \sim \mathrm{Inv}\text{-}\Gamma(a_\tau, b_\tau). The indicator is distributed as

γjwδ1(γj)+(1w)δv0(γj)\gamma_j \sim w\, \delta_1(\gamma_j) + (1-w)\, \delta_{v_0}(\gamma_j)

with v01v_0 \ll 1, so when γj=1\gamma_j = 1 the “slab” is large, and for γj=v0\gamma_j = v_0 the “spike” imposes near-zero variance.

When vj20v_j^2 \approx 0, the corresponding function is excluded; when vj2v_j^2 is large, it remains active.

3. Multiplicative Parameter Expansion (peNMIG): Improving MCMC Mixing

Standard Gibbs updates for coefficient blocks suffer from “stickiness” near the spike due to high posterior correlations. The paper introduces a novel multiplicative expansion:

βj=αjξj\beta_j = \alpha_j \xi_j

where αjR\alpha_j \in \mathbb{R} (scalar), ξjRdj\xi_j \in \mathbb{R}^{d_j}. The prior is specified as:

  • αjN(0,vj2)\alpha_j \sim N(0, v_j^2) with vj2=γjτj2v_j^2 = \gamma_j \tau_j^2
  • ξjkmjkN(mjk,1)\xi_{jk} | m_{jk} \sim N(m_{jk},1), with mjk12δ1(mjk)+12δ1(mjk)m_{jk} \sim \frac{1}{2}\delta_1(m_{jk}) + \frac{1}{2}\delta_{-1}(m_{jk})

Advantages of this expansion:

  • Only the scalar αj\alpha_j enters the block-wise spike-and-slab mechanism, eliminating poor MCMC mixing of high-dimensional blocks.
  • The marginal prior on βj\beta_j has strong shrinkage at zero (spike) and heavy tails (slab), yielding q\ell_q–like penalties for q<1q<1, similar in behavior to horseshoe or normal–Jeffreys priors.
  • The prior hierarchy is:

τj2Γ1(aτ,bτ) γjwδ1(γj)+(1w)δv0(γj) αjγj,τj2N(0,γjτj2) ξjkmjkN(mjk,1),mjk12δ1(mjk)+12δ1(mjk)\begin{aligned} \tau_j^2 &\sim \Gamma^{-1}(a_\tau, b_\tau) \ \gamma_j &\sim w \delta_1(\gamma_j) + (1-w) \delta_{v_0}(\gamma_j) \ \alpha_j | \gamma_j, \tau_j^2 &\sim N(0, \gamma_j \tau_j^2) \ \xi_{jk} | m_{jk} &\sim N(m_{jk}, 1),\quad m_{jk} \sim \frac{1}{2}\delta_1(m_{jk}) + \frac{1}{2}\delta_{-1}(m_{jk}) \end{aligned}

4. Posterior Computation: Efficient MCMC via Blocked and Hierarchical Structure

  • For Gaussian models, all conditional distributions have closed forms; thus, a Gibbs sampler can update all parameters efficiently:
    • Blocked updates for (αj,ξj)(\alpha_j, \xi_j) with αj\alpha_j controlling inclusion.
    • The indicator γj\gamma_j and variance τj2\tau_j^2 are only updated based on αj\alpha_j, not the full block βj\beta_j, thus enabling rapid transitions between inclusion and exclusion states.
    • A rescaling step is used after updating ξj\xi_j to keep its entries close to unit norm, maintaining identifiability.
  • For non-Gaussian likelihoods, the conditional for (αj,ξj)(\alpha_j, \xi_j) is updated via Metropolis–Hastings with proposals based on iteratively weighted least squares.

5. Performance Characteristics, Sensitivity, and Comparative Results

  • In simulation studies and classification benchmarks (e.g., UCI datasets), the peNMIG approach produces models with lower predictive deviance and higher sparsity compared to component-wise boosting, adaptive COSSO (ACOSSO), and frequentist penalization methods.
  • The method is relatively robust to hyperparameter choices, especially the prior inclusion probability ww, except for the threshold parameter v0v_0, which strongly affects sparsity by controlling spike “sharpness.”
  • Recovery of relevant functional terms—true model structure—is highly accurate even with complex high-dimensional predictors and when selecting among functional representations (e.g., linear vs. nonlinear, varying coefficients, spatial effects).
  • Computation is feasible for hundreds of model terms due to the efficiency of the parameter-expanded sampler and is not limited by the curse of dimensionality in block sampling.

6. Applications: Survival, Spatial, and Mixed Model Scenarios

  • Additive piecewise exponential survival models: The hazard function is modeled as

λ(t,xi)=exp(g0(j)+g(j)vil(j)+mfm(uim(j))+zi(j)γ),t(κj1,κj]\lambda(t, x_i) = \exp\left(g_0(j) + \sum_\ell g_\ell(j) v_{il}(j) + \sum_m f_m(u_{im}(j)) + z_i(j)' \gamma\right),\qquad t \in (\kappa_{j-1}, \kappa_j]

The spike-and-slab approach allows model selection among time-varying covariate functions, testing linearity/nonlinearity, and inclusion/exclusion of functional and parametric effects.

  • Geoadditive regression: For spatial analysis (e.g., Munich rent data), geoadditive smoothers modeled by Gaussian Markov random fields are included as blocks—spike-and-slab selection distinguishes spatial, smooth, and parametric components for interpretability.
  • Additive mixed logit models: For categorical/discrete responses with random effects, the procedure supports selection among random, smooth, and spatial effects while maintaining a parsimonious model.

7. Summary and Theoretical Implications

  • By combining blockwise spike-and-slab priors (with peNMIG construction) and a parameter expansion, efficient and robust function selection is feasible for structured additive regression with complex model spaces.
  • The multiplicative expansion addresses critical MCMC mixing and convergence issues and induces strong shrinkage and heavy-tailed tolerance for large signals, paralleling properties of other state-of-the-art shrinkage priors.
  • Spike-and-slab priors with this architecture provide a unified, computationally tractable framework for function/term selection in additive models with a broad class of effect types and interactions, and are validated by superior empirical model selection and predictive performance on benchmark data.

Summary Table: Key Model Elements in Parameter-Expanded NMIG Spike-and-Slab Prior

Component Distribution Role in Model Selection
γj\gamma_j wδ1+(1w)δv0w \delta_1 + (1-w)\delta_{v_0} Inclusion/exclusion of entire block
τj2\tau_j^2 Γ1(aτ,bτ)\Gamma^{-1}(a_\tau, b_\tau) Controls block-wise variance magnitude
αj\alpha_j N(0,γjτj2)N(0, \gamma_j\tau_j^2) Overall scale, drives block activity
ξjk\xi_{jk} N(mjk,1)N(m_{jk}, 1), mjk=±1m_{jk} = \pm 1 Unit-norm constraint, identifies structure