Variational Predictive Coding (VPC)

Updated 1 February 2026

Variational Predictive Coding is a framework that formalizes hierarchical Bayesian inference by integrating predictive coding theories with deep generative models and local Hebbian learning.
It employs precision-weighted prediction errors and local gradient descent updates to achieve rapid convergence and effective uncertainty quantification.
VPC enhances sample-efficient learning and robust performance across applications such as speech representation, vision tasks, and goal-directed planning.

Variational predictive coding (VPC) encompasses a class of Bayesian inference frameworks that formalize predictive coding—originally a theory of cortical information processing—within the variational free-energy paradigm. These frameworks generalize classical predictive coding to probabilistic deep generative models, providing a biological and information-theoretically principled alternative to backpropagation for learning and inference. VPC unifies multi-layer hierarchical generative models, variational posteriors, precision-weighted prediction errors, and local Hebbian learning rules, and has led to novel architectures and algorithms with enhanced uncertainty quantification, faster convergence, and sample-efficient learning in high-dimensional structured domains.

1. Mathematical Foundations: Generative Models and Variational Free Energy

VPC is grounded in hierarchical latent-variable generative models. Consider a Gaussian hierarchy with $L$ hidden layers, latent states $\mathbf{Z} = \{\mathbf{z}_l\}_{l=0}^L$ , and parameters $\boldsymbol{\Theta} = \{ (\mathbf{W}_l, \boldsymbol{\Sigma}_l) \}_{l=1}^L$ . The joint density is

$\begin{aligned} p(\mathbf{Z},\boldsymbol{\Theta}) &= p(\mathbf{z}_0) \prod_{l=1}^L p(\mathbf{z}_l | \mathbf{z}_{l-1}, \mathbf{W}_l, \boldsymbol{\Sigma}_l) p(\mathbf{W}_l, \boldsymbol{\Sigma}_l),\ p(\mathbf{z}_l | \mathbf{z}_{l-1}, \mathbf{W}_l, \boldsymbol{\Sigma}_l) &= \mathcal{N}(\mathbf{z}_l | \mathbf{W}_l f(\mathbf{z}_{l-1}), \boldsymbol{\Sigma}_l), \end{aligned}$

with priors chosen from conjugate families, e.g., Gaussian for $\mathbf{z}_0$ , Matrix-Normal-Wishart for $(\mathbf{W}_l, \Sigma_l)$ (Tschantz et al., 31 Mar 2025).

The variational free energy (negative ELBO) objective is

$F(\mathbf{Z}, \boldsymbol{\lambda}) = \mathbb{E}_{q(\boldsymbol{\Theta};\boldsymbol{\lambda})}[\ln q(\boldsymbol{\Theta};\boldsymbol{\lambda}) - \ln p(\mathbf{Z}, \boldsymbol{\Theta})],$

where $\boldsymbol{\lambda}$ denotes variational parameters. Under a mean-field/Laplace approximation for the latent states, $q(\mathbf{Z})$ becomes a Dirac delta yielding precision-weighted quadratic prediction errors at each layer, generalizing classical PC (Salvatori et al., 2023).

2. Inference Dynamics: Precision-Weighted Prediction Errors

Inference in VPC proceeds via gradient descent on the variational free energy with respect to latent states. The update rule for an intermediate latent state $\mathbf{z}_l$ is

$\mathbf{z}_l \leftarrow \mathbf{z}_l - \alpha \frac{1}{2}\left[ \Sigma_l^{-1} (\mathbf{z}_l - \mathbf{W}_l f(\mathbf{z}_{l-1})) - \mathbf{D}(\mathbf{z}_l) \mathbf{W}_{l+1}^\top \Sigma_{l+1}^{-1} (\mathbf{z}_{l+1}-\mathbf{W}_{l+1} f(\mathbf{z}_l)) \right],$

where $\mathbf{D}(\mathbf{z}_l) = \text{diag}(f'(\mathbf{z}_l))$ is the local Jacobian, encoding dendritic derivatives (Tschantz et al., 31 Mar 2025). These updates are local: each neuron requires only its own prediction error, dendritic Jacobian, and signals from adjacent layers.

Hybrid forms of inference combine fast amortized (feedforward) initialization with slow, iterative precision-weighted refinement. This yields a unified process theory for both rapid (e.g., feedforward sweep in visual cortex) and context-sensitive perception (Tschantz et al., 2022).

3. Bayesian Treatment of Parameters and Local Hebbian Learning

VPC extends classical PC from maximum likelihood/maximum a posteriori estimation to full Bayesian inference over model parameters. Specifically, for each layer with Gaussian conditional likelihood and Matrix-Normal-Wishart prior, the variational posterior inherits the same family. Closed-form natural-parameter updates are given by sums of sufficient statistics:

$\boldsymbol{\eta}_l^\star = \boldsymbol{\eta}_l^{(0)} + \sum_{n=1}^N \Bigl( f(\mathbf{z}_{l-1}^{(n)})f(\mathbf{z}_{l-1}^{(n)})^\top, f(\mathbf{z}_{l-1}^{(n)})\mathbf{z}_l^{(n)\top}, \mathbf{z}_l^{(n)}\mathbf{z}_l^{(n)\top}, 1 \Bigr).$

Such blockwise updates are Hebbian in nature, as they involve pre–pre, pre–post, and post–post outer products (Tschantz et al., 31 Mar 2025). This preserves both locality and biological plausibility.

4. Algorithmic Workflow and Locality

A canonical VPC algorithm alternates between two stages:

E-step (Inference): Clamp inputs and outputs; iteratively relax intermediate latent states using precision-weighted gradient descent until fixed points of the free energy are reached.
M-step (Learning): Update natural parameters of the weight posteriors in closed form via local sufficient statistics (Tschantz et al., 31 Mar 2025).

This is expressed in fully local circuits: each layer computes errors, Jacobians, and sufficient statistics based solely on messages from its Markov blanket (adjacent layers in deep models, parents/children in graphical models) (Sennesh et al., 2024, Salvatori et al., 2023). Extensions such as Divide-and-Conquer Predictive Coding allow scalable inference in arbitrary graphical models, maintaining the locality and blockwise update properties (Sennesh et al., 2024).

5. Uncertainty Quantification and Convergence Properties

Maintaining full posteriors $q(\mathbf{W}_l, \Sigma_l)$ over parameters enables quantification of both aleatoric (via $\Sigma_l$ ) and epistemic uncertainty (spread of $\mathbf{W}_l$ ). VPC achieves uncertainty quantification competitive with Bayesian deep learning methods like Bayes-by-Backprop, while preserving predictive-coding locality (Tschantz et al., 31 Mar 2025).

Convergence analysis shows that in full-batch settings, VPC’s closed-form M-step yields rapid convergence, often in a handful of epochs, compared to gradient-based approaches. In mini-batch or stochastic settings, natural-gradient updates maintain competitive accuracy. Empirical studies on standard UCI benchmarks and deep vision tasks demonstrate improved log-likelihoods, sample diversity, and predictive power when integrating uncertainty regularization through Hessian/curvature terms or sampling-based updates (Zahid et al., 2023).

6. Practical Variants and Applications

Recent lines extend VPC to sequential models (PV-RNN, VBP-RNN) via recurrent hierarchical architectures with stochastic latent states and adaptive vector mechanisms (Ahmadi et al., 2018, Ahmadi et al., 2017). These models regulate the balance between reconstruction and regularization by meta-prior hyper-parameters, tuning the trade-off between deterministic memorization and probabilistic generalization.

In speech representation learning, the VPC formalism subsumes and improves HuBERT and related self-supervised objectives (APC, CPC, wav2vec, BEST-RQ) by interpreting them as instantiations of the VPC ELBO under various masking, quantization, and codebook parameterizations (Yeh et al., 31 Dec 2025). Softening the quantizer and sampling within the reconstruction term yield immediate downstream gains in phone classification, speaker verification, and ASR.

VPC has also enabled sample-efficient goal-directed planning under naturalistic constraints by embedding adaptive attention, visual working memory, and top-down/bottom-up predictive coding style error regression (Jung et al., 2019).

7. Relation to Other Methods and Theoretical Significance

VPC is a generalization of classical predictive coding that admits arbitrary likelihoods, prior families, and structured variational posteriors. It encompasses classical predictive coding (MAP/ML estimation), Bayesian deep learning (Bayes-by-Backprop), iterative/amortized inference hybrids, and Langevin sampling-based approaches.

Compared to standard backpropagation, VPC offers strictly local error and learning signals, supports complex non-feedforward architectures, and is intrinsically biologically plausible by mapping updates to local Hebbian rules and error neurons (Salvatori et al., 2023). Uncertainty quantification, stability, and robustness to out-of-distribution shifts are direct consequences of the variational paradigm (Boutin et al., 2020, Zahid et al., 2023).

Recent advances demonstrate VPC’s applicability in large-scale deep generative models, adaptive perceptual control, sample-efficient planning, robust representation learning, and structured inference in probabilistic graphical models. A plausible implication is that VPC provides a scalable framework for both biologically realistic cortical computation and effective deep learning in high-dimensional, structured environments.