Spike-and-Slab Sparse Coding (S3C)
- Spike-and-Slab Sparse Coding (S3C) is a probabilistic latent variable model that combines spike-and-slab priors with directed sparse coding to control sparsity and amplitude.
- It employs a structured variational EM procedure with parallel, GPU-friendly updates to efficiently learn features and decompose signals.
- S3C achieves competitive performance in low-label and transfer learning scenarios, demonstrating state-of-the-art accuracy in image classification tasks.
Spike-and-Slab Sparse Coding (S3C) is a probabilistic latent variable model combining spike-and-slab priors with a directed sparse coding architecture. It forms a highly regularized framework for unsupervised feature learning and signal decomposition, enabling decoupled control over sparsity and magnitude of latent activations. S3C has been demonstrated to provide state-of-the-art feature representations, especially in low-label and transfer-learning regimes, and admits scalable variational inference procedures well-suited for GPU acceleration (Goodfellow et al., 2012, Goodfellow et al., 2012).
1. Generative Model: Architecture and Priors
S3C models observed data vectors as generated by latent "spike-and-slab" units. For each factor :
- Spike prior: Each spike variable is drawn independently from a Bernoulli,
with denoting the logistic sigmoid and a learned bias.
- Slab prior: Given , the real-valued slab is Gaussian:
where is the slab mean (when the spike is active) and is precision.
- Observation model: The visible data is generated as
where is a dictionary, denotes elementwise product, and is the noise precision (often isotropic or diagonal).
The full joint is
The spike variable gates the contribution of to reconstruction, yielding strict control over sparsity, while the slab provides amplitude modulation (Goodfellow et al., 2012, Goodfellow et al., 2012).
2. Approximate Inference: Structured Variational EM
Exact posterior inference for is intractable due to the explaining-away interactions among spikes. S3C employs a structured mean-field variational posterior of the form
where is tightly coupled, but factors across . The optimal form is given by
with and as variational parameters.
Fixed-point updates for these parameters are:
- Slab-mean update:
- Spike-probability update:
with residual (Goodfellow et al., 2012). Updates employ parallelization, damping, and clipping for numerical stability—enabling fully vectorized GPU implementations (Goodfellow et al., 2012).
3. Learning: Parameter Estimation via Variational EM
Parameters are learned by maximizing the variational lower bound (evidence lower bound, ELBO) via variational EM:
- E-step: Run the above fixed-point updates to obtain variational parameters for each data point.
- M-step: Maximize the expected complete-data log-likelihood
Closed-form updates exist for , , , , , though in practice small gradient steps are often preferred for stability (Goodfellow et al., 2012).
- is updated (with column normalization) via:
Analogous analytic updates are provided for noise, biases, and slab parameters.
The E- and M-steps are alternated until convergence. Convergence in the E-step typically requires only a small number of parallel iterations (Goodfellow et al., 2012, Goodfellow et al., 2012).
4. Computational Scalability and Parallel Inference
S3C's GPU-adapted variational inference is based on fully parallel updating of all spike and slab parameters, with per-variable damping and sign-flip clipping. Each E-step iteration consists of batched matrix-vector operations and non-linearities, decomposing into parallelizable BLAS calls (Goodfellow et al., 2012, Goodfellow et al., 2012). The algorithmic structure is:
- Initialize and .
- For iterations:
- Compute for all in parallel, apply clipping and damping.
- Compute for all in parallel, apply damping.
This approach allows scaling to thousands of latent factors (up to demonstrated), tens of millions of image patches, and large batch feature extraction (Goodfellow et al., 2012).
5. Applications: Feature Discovery and Classification Performance
S3C is principally used as an unsupervised feature learner for image classification, transfer learning, and semi-supervised learning scenarios (Goodfellow et al., 2012, Goodfellow et al., 2012). The standard processing pipeline on images is:
Extract normalized, whitened patches (e.g., ).
- Run S3C variational inference per patch to obtain activations.
- Pool activations spatially on a coarse grid (e.g., ), yielding high-dimensional feature vectors.
- Train a linear SVM on pooled features for classification.
On CIFAR-10, S3C with 3x3 pooling and factors achieved accuracy, competitive with state-of-the-art sparse coding () and outperforming spike-and-slab RBMs (). On the "self-taught" Transfer-Learning Challenge, S3C won the competition with accuracy using only 120 labels and 100,000 unlabeled samples (Goodfellow et al., 2012). In low-label regimes, S3C outperforms both raw-pixel and logistic models due to flexible regularization.
6. Model Interpretability and Comparison to Related Approaches
S3C combines gated continuous latents (from sparse coding) with the explicit spike-and-slab prior (from spike-and-slab RBMs), providing independent control of sparsity (via ) and scale (via , ). As a directed model, S3C features a tractable partition function, avoiding the intractability of undirected models like RBMs and enabling efficient variational inference. The variational E-step captures some, though not all, posterior dependencies (“explaining-away” among spikes), surpassing fully factored mean-field approaches in tasks such as source separation and denoising (Sheikh et al., 2012, Lücke et al., 2011).
In contrast, MAP or greedy algorithms often employ convex relaxations (e.g., LASSO) or combinatorial support selection (as in adaptive ADMM methods), but do not model the full latent uncertainty structure of S3C (Bayisa et al., 2018).
7. Extensions and Empirical Observations
Empirical studies show the truncated EM approach—where the posterior is truncated to the most probable spike patterns—outperforms factored variational inference, particularly under high noise or for highly non-orthogonal dictionaries, due to its better approximation of multi-modal and correlated posterior mass (Sheikh et al., 2012). S3C continues to improve with increased latent dimensionality, unlike standard factored methods where performance often saturates or degrades.
Experiments on source separation, denoising, and image classification consistently demonstrate the value of the spike-and-slab framework in inducing both accurate and highly sparse representations (Goodfellow et al., 2012, Sheikh et al., 2012). The model's GPU-friendly inference and scalability enable applications to modern large-scale recognition and transfer-learning challenges (Goodfellow et al., 2012).