Spike-and-Slab Changepoint Selection
- The paper demonstrates that spike-and-slab priors enable accurate detection and localization of changepoints with consistency guarantees under varying noise conditions.
- Methodologies such as joint Gibbs sampling and marginal solo approaches balance computational tractability with robust statistical inference in both univariate and dynamic regression settings.
- Empirical results show that the spike-and-slab framework outperforms frequentist methods in accuracy and scalability, especially in high-dimensional and heavy-tailed noise scenarios.
Spike-and-slab priors provide a principled Bayesian framework for selecting changepoints in time series and regression models by distinguishing between signal and noise via a binary latent indicator structure. These methodologies enable consistent estimation of both the number and locations of changepoints and are notable for balancing computational tractability, model flexibility, and robust statistical guarantees under a broad range of noise conditions.
1. Model Foundations and Spike-and-Slab Priors
The canonical setting for Bayesian changepoint analysis with spike-and-slab priors assumes a univariate time series: where . The signal is modeled as piecewise constant, with unknown changepoints . Segment means define , and changepoints correspond to jumps in the increments . An auxiliary binary sequence encodes changepoint structure: Spike-and-slab priors are placed on 0 via these indicators:
- 1, with 2.
- Conditional on 3:
4
5
where 6 and 7 (e.g., 8).
The spike imposes strong shrinkage to zero (favoring continuity), while the slab allows for large jumps (detecting changepoints) (Cappello et al., 2021).
For regression settings, a dynamic spike-and-slab prior can be hierarchically placed on time-varying regression coefficients, with binary indicators 9 controlling switching between spike and slab regimes on each coefficient at each time (Uribe et al., 2020).
2. Posterior Inference Algorithms
Two primary algorithmic frameworks have been proposed:
A. Joint Gibbs Sampling (basad.cp)
The MCMC strategy jointly samples from the posterior:
- At each iteration, 0 and 1 are updated for all 2.
- Provides high accuracy but suffers from slow mixing as 3 grows, with computational costs scaling as 4.
B. Marginal Solo Spike-and-Slab (solo.cp)
This approach considers one candidate changepoint 5 at a time:
- A spike-and-slab prior is placed on 6; all other increments receive conjugate Gaussian priors.
- Marginalization over nuisance parameters leads to a closed-form two-component Gaussian mixture for the marginal posterior of 7:
8
- The posterior inclusion probability 9 is directly computed using mixture weights and prior probabilities.
- The solo.cp algorithm entails a forward recursion shared by all 0, and a 1 backward pass per candidate, with total cost 2 and no reliance on MCMC.
For dynamic linear models, a state-space structure is employed, and posterior inference is performed using a Gibbs sampler that alternates between Forward-Filtering Backward-Sampling (FFBS) for coefficients and block updates of spike/slab indicators via schemes such as the Gerlach–Carter–Kohn algorithm for efficiency (Uribe et al., 2020).
3. Estimation of Changepoints and Model Selection
The spike-and-slab formalism enables summary statistics for changepoint selection:
- Posterior inclusion probabilities 3 are computed.
- A median-probability model selects raw changepoint candidates as 4.
- To mitigate spurious detection of consecutive changepoints, clustering of close indices (within a user-specified 5) is performed, retaining only the member with the highest 6 in each cluster.
- The estimated changepoint set and number is 7 and 8.
For regression models, the posterior probability of regime change 9 summarizes the likelihood of a changepoint in the shrinkage regime for each coefficient and time index (Uribe et al., 2020).
4. Theoretical Guarantees
Rigorous model selection properties are established under specific asymptotic regimes:
- If the true minimal jump size 0 and minimal spacing between changepoints are large (1; 2), and hyperparameters are scaled as 3, 4, 5, then the MAP estimator achieves:
6
- This result guarantees consistency and near-optimal localization rate 7 (Cappello et al., 2021).
Single-changepoint regimes require weaker signal-to-noise: 8 suffices for near-optimal location accuracy of 9.
5. Computational Complexity and Scalability
The solo.cp algorithm achieves significant gains in scalability over traditional Bayesian MCMC:
- Its total runtime is 0 for 1 timepoints, with no sampling or mixing concerns; implementations with 2 in the low thousands complete in seconds to minutes on a single CPU.
- The basad.cp variant requires runs of Gibbs sampling, costing 3 and may require multiple hours for 4 (Cappello et al., 2021).
For dynamic models with 5 regressors, one FFBS pass is 6 (under a diagonal variance assumption), and updates for 7 and 8 can be parallelized or vectorized, resulting in overall per-sweep cost 9 (Uribe et al., 2020).
6. Empirical Assessment and Robustness
Empirical benchmarks validate the spike-and-slab changepoint selection framework:
- On simulated signals (BLOCKS, 0, 1; TEETH, 2, 3) and under various noise models (Gaussian, Laplace, 4, mixture-Gaussian), the solo.cp method matches or exceeds the accuracy of state-of-the-art frequentist approaches (wbs, smuce, pelt, r-fpop), especially under heavy-tailed or contaminated noise.
- Frequentist methods typically overestimate 5 in the presence of outliers, while spike-and-slab approaches enforce parsimony.
- On real data applications such as aCGH microarray and ion-channel recordings, Bayesian spike-and-slab methods yield plausible, interpretable segmentation while being less sensitive to hyperparameter tuning than frequentist benchmarks (Cappello et al., 2021).
7. Extension to Dynamic Regression and Markov Switching
In dynamic linear regression, spike-and-slab priors capture time-varying sparsity of regression coefficients:
- Binary indicators 6, generated from two-state Markov chains, allow regime switching in the shrinkage variance of coefficients, enabling block-structured or persistent sparsity patterns.
- Full conditional updates and forward-backward sampling algorithms (e.g., Gerlach–Carter–Kohn) permit efficient posterior computation and flexible changepoint analysis across multiple predictors and time (Uribe et al., 2020).
This formalism extends changepoint concepts to high-dimensional, temporally structured variable selection, systematically connecting classical changepoint analysis, modern Bayesian regression, and dynamic sparsity modeling.
For comprehensive methodological exposition, theoretical guarantees, and empirical details, see (Cappello et al., 2021) and (Uribe et al., 2020).