Poisson VAE: Spike-Based Autoencoder
- Poisson VAE is a neural model that replaces Gaussian latent variables with discrete, Poisson-distributed spike counts to achieve biologically plausible representations.
- It utilizes reparameterization tricks and surrogate gradients for efficient training of discrete latent variables, ensuring robust inference and high-dimensional latent utilization.
- The framework is applied to spiking neural networks and perceptual decision models, enabling improved sample efficiency and alignment with metabolic cost constraints found in brain-like systems.
A Poisson Variational Autoencoder (Poisson VAE or P-VAE) is a subclass of variational autoencoder architectures that replaces the standard continuous latent variables (typically Gaussian) with biologically-inspired discrete Poisson-distributed latent spike counts. The model is motivated by the quest for interpretable, brain-like, and metabolically plausible representations that bridge Bayesian inference, predictive coding, and sparse coding in computational neuroscience and machine learning (Vafaii et al., 23 May 2024). The Poisson VAE is also directly applicable to spiking neural networks and temporally resolved perceptual decision models (Zhan et al., 2023, Johnson et al., 14 Nov 2025).
1. Probabilistic Model Structure
The Poisson VAE models the latent code as a vector of spike counts over a fixed time window, where each component is independently Poisson distributed. The generative component consists of:
- Latent prior:
with as (fixed or learnable) baseline firing rates for each latent "neuron".
- Decoder (likelihood):
Observed data are modeled by where is either (a) a neural-network decoder or (b) a linear mapping for analytical tractability.
The inference network ("encoder") produces the parameters for the approximate posterior:
where is a modulation factor computed by the encoder.
Direct backpropagation through Poisson draws is not trivial; differentiable surrogates are achieved through reparameterization tricks such as approximating spike generation with soft thresholds or surrogate gradients (Vafaii et al., 23 May 2024, Zhan et al., 2023).
2. Variational Objective and Metabolic Cost
Maximizing the evidence lower bound (ELBO) yields the following negative ELBO (to be minimized):
The -divergence between two Poisson distributions takes the closed-form:
The KL term acts as a "metabolic cost" on firing rates, enforcing sparsity—mirroring L1 penalties in classic sparse coding:
with and as a scaling hyperparameter (Vafaii et al., 23 May 2024).
3. Training and Inference Techniques
Standard VAE training with Adam or Adamax optimizers and KL-annealing schedules is employed. For the discrete Poisson latent variables, the reparameterization trick uses the observation that Poisson counting can be simulated via exponential waiting times, with differentiable surrogates created by smoothing the counting process (e.g., soft sigmoids in place of hard steps) (Vafaii et al., 23 May 2024, Zhan et al., 2023). ESVAE (Zhan et al., 2023) advances this with a reparameterizable spike-based sampler and surrogate "straight-through" gradients for binary spike events.
Training is robust to posterior collapse — the P-VAE exhibits nearly complete activity in its latent space (only 2% "dead" latents), compared with 80% inactivity in standard Gaussian and Laplace VAEs (Vafaii et al., 23 May 2024).
4. Representation Geometry and Downstream Performance
The geometry of P-VAE latent representations has the following empirical properties (Vafaii et al., 23 May 2024):
- High-dimensionality:
The participation ratio , computed from the covariance eigenvalues of , is higher in P-VAE than in Gaussian or Laplace VAEs. This indicates a broader spread in latent usage.
- Linear separability and shattering:
Nonparametric K-NN classification in the latent space achieves high test accuracy ( at samples) compared to the Gaussian VAE (requiring for ). This is a fivefold gain in sample efficiency for downstream classification.
- Shattering dimension:
Logistic regression across all $10$-choose-$5=252$ random class splits in MNIST confirms a higher shattering dimension for P-VAE representations, e.g., for , the shattering index is for P-VAE vs. for Gaussian VAE.
P-VAE’s learned basis functions, in linear decoder settings, recover Gabor-like rows reminiscent of biological visual cortex and classical sparse coding methods (Vafaii et al., 23 May 2024).
5. Extensions and Spiking Implementations
Integration into spiking neural networks is realized in the ESVAE architecture (Zhan et al., 2023). This approach models both the posterior and prior over latent spike counts as product Poissons, where spike counts are estimated from observed SNN firing rates. ESVAE enables direct and interpretable sampling, replacing implicit autoregressive Bernoulli latent samplers.
The ESVAE employs an MMD (maximum mean discrepancy) surrogate on firing rates, improving sample diversity, robustness to noise and temporal jitter, and gives superior reconstruction and generation metrics on image datasets, such as CIFAR-10, versus competing frameworks (e.g., FSVAE's FID $175.5$, Inception $2.945$ vs. ESVAE's FID $127.0$, Inception $3.758$) (Zhan et al., 2023).
6. Applications and Neuroscientific Relevance
P-VAE frameworks offer a formal link between Bayesian inference, predictive coding, and sparse neural coding in perception. The discrete, non-negative spike count code, metabolic cost (reflecting energy efficiency constraints), and separate feedforward/feedback mechanisms confer high biological realism (Vafaii et al., 23 May 2024, Johnson et al., 14 Nov 2025). The predictive-coding variant, with the encoder producing "error-driven" multiplicative modulations of baseline rates, further aligns with established models of neural error signaling in cortex.
In principled models of perceptual decision making, a P-VAE can provide a trial-by-trial account of choices and response times. Spike-count latent accumulation allows the generative model to capture key psychophysical signatures such as right-skewed RT distributions, Hick’s law (RTnumber of alternatives), stochastic response variability, and speed–accuracy trade-offs (Johnson et al., 14 Nov 2025).
7. Limitations and Prospective Directions
A Gaussian decoder is a coarse choice for pixels; modifications to use Bernoulli or discretized logistic likelihood are needed for certain data types (Johnson et al., 14 Nov 2025). Score-function (e.g., REINFORCE) estimators for discrete latent gradients introduce variance; variance-minimizing baselines can partially address this. Prior rate selection and static encoding are areas for improvement; using inhomogeneous/continuous-time Poisson models or learnable prior rates are promising future directions, as is the adoption of more expressive posterior distributions (e.g., mixture-of-Poissons or normalizing flows on rates) (Johnson et al., 14 Nov 2025).
Key References
- "Poisson Variational Autoencoder" (Vafaii et al., 23 May 2024)
- "ESVAE: An Efficient Spiking Variational Autoencoder with Reparameterizable Poisson Spiking Sampling" (Zhan et al., 2023)
- "Inferring response times of perceptual decisions with Poisson variational autoencoders" (Johnson et al., 14 Nov 2025)