Papers
Topics
Authors
Recent
2000 character limit reached

Incremental Predictive Coding (iPC)

Updated 6 January 2026
  • iPC is a fully automatic predictive coding algorithm that incrementally updates both latent variables and synaptic weights, eliminating the need for separate inference and learning phases.
  • It interleaves inference and learning at every time step, achieving faster convergence, greater robustness across hyperparameters, and improved biological plausibility through local updates.
  • Empirical studies show iPC's superior performance in image classification and language modeling, with theoretical convergence guarantees provided by the incremental EM framework.

Incremental Predictive Coding (iPC) is a fully automatic learning algorithm for predictive coding networks, designed to overcome significant limitations in the original predictive coding (PC) formulation. Rooted in both Bayesian statistics and neuroscience, iPC addresses inefficiency and instability in training by interleaving inference and learning phases at every time step. This scheduling change enables iPC to achieve faster convergence, greater robustness across a broad range of hyperparameters, and improved biological plausibility, particularly in synaptic update autonomy. Extensive empirical studies demonstrate iPC’s superior performance in image classification and masked language modeling, with theoretical convergence guarantees substantiated via the incremental EM framework (Salvatori et al., 2022).

1. Predictive Coding Networks: Fundamentals and Original Learning Rule

Predictive coding (PC) is a hierarchical network approach aimed at minimizing a “variational free energy,” equivalent to the total prediction error. The standard generative model is specified layerwise by

μ(l)=θ(l)f(x(l+1)),l=0,,L1\mu^{(l)} = \theta^{(l)}\,f(x^{(l+1)}), \quad l=0,\dots,L-1

where x(l)x^{(l)} are latent state variables and θ(l)\theta^{(l)} are synaptic weights. The instantaneous, layerwise prediction error is

ε(l)=x(l)μ(l)\varepsilon^{(l)} = x^{(l)} - \mu^{(l)}

with the aggregate energy (free-energy) defined as

F=l=0L1ε(l)2F = \sum_{l=0}^{L-1}\|\varepsilon^{(l)}\|^2

In the original PC algorithm, learning alternates between:

  • Inference (E-step): Updating latent variables x(l)x^{(l)} using fixed weights θ\theta, by gradient-descent steps (typically repeated T1T \gg 1 times per sample): x(l)x(l)+γ[ε(l)+f(x(l))(θ(l1))Tε(l1)]x^{(l)} \leftarrow x^{(l)} + \gamma\left[-\varepsilon^{(l)} + f'(x^{(l)}) * (\theta^{(l-1)})^{T}\varepsilon^{(l-1)}\right]
  • Learning (M-step): Upon inference convergence, updating weights through

θ(l)θ(l)+αx(l+1)ε(l)\theta^{(l)} \leftarrow \theta^{(l)} + \alpha\,x^{(l+1)}\varepsilon^{(l)}

Despite locality in the update rules, PC requires an external “switch” to decouple inference and learning phases—contrary to biological autonomy.

2. The Incremental Predictive Coding (iPC) Update Schedule and Equations

iPC fundamentally alters the temporal scheduling by concurrently and continuously updating both latent variables and synaptic weights at every iteration. For data point ii, layer ll, and time step tt:

  • Inference update:

x(l)(t+1)=x(l)(t)+γ[ε(l)(t)+f(x(l)(t))(θ(l1)(t))Tε(l1)(t)]x^{(l)}(t+1) = x^{(l)}(t) + \gamma [-\varepsilon^{(l)}(t) + f'(x^{(l)}(t)) * (\theta^{(l-1)}(t))^{T}\varepsilon^{(l-1)}(t)]

  • Weight update:

θ(l)(t+1)=θ(l)(t)+αx(l+1)(t)ε(l)(t)\theta^{(l)}(t+1) = \theta^{(l)}(t) + \alpha x^{(l+1)}(t) \varepsilon^{(l)}(t)

This scheduling eliminates the need for an explicit “stop inference, start learning” signal. Instead, the model updates all parameters at every mini-batch iteration.

3. Theoretical Analysis and Convergence Guarantees

iPC corresponds to the incremental Expectation-Maximization (EM) algorithm applied to the free-energy objective over the dataset,

Fglobal(θ)=i=1NF(xi,θ)F_{\rm global}(\theta) = \sum_{i=1}^N F(x_i, \theta)

Under standard assumptions—bounded and continuously differentiable free-energy, and step-sizes (αt,γt)(\alpha_t, \gamma_t) satisfying Robbins–Monro conditions—the sequence {θ(t)}\{\theta(t)\} converges almost surely to a stationary point of Fglobal(θ)F_{\rm global}(\theta), ensuring that iPC finds a local minimum of the total prediction error. Convergence follows from viewing updates as stochastic approximation steps and invoking two-time-scale analysis for coupled variable updates.

4. iPC Algorithmic Structure

A concise summary of the iPC procedure (see Algorithm 1):

1
2
3
4
5
6
for t in range(T):
    for i in range(N):
        for l in range(1, L-1):
            x[i][l] += gamma * (-epsilon[i][l] + f_prime(x[i][l]) * theta[l-1].T @ epsilon[i][l-1])
    for l in range(L):
        theta[l] += alpha * x[i][l+1] * epsilon[i][l]
A key distinction from classic PC is that no external signal is required for weight updates; all parameters are updated incrementally and in parallel.

5. Experimental Evaluation Protocols and Metrics

The empirical assessment comprises three categories:

  • Efficiency on small generative/discriminative tasks: Comparing iPC, classic PC, and parallel EM across fully-connected networks with depths $4$–$5$ and hidden widths {128,256,512}\{128,256,512\} on CIFAR-10, Tiny ImageNet, and Fashion-MNIST. Evaluation metrics include free-energy trajectories and matrix multiplication counts.
  • Supervised image classification: MLPs, CNNs, and AlexNet-style models on MNIST, SVHN, CIFAR-10. Hyperparameter grids span learning rates (α,γ\alpha, \gamma), weight decay, batch sizes. Metrics: accuracy and convergence rates.
  • Language modeling via transformers: 2-layer encoder (BERT) or decoder (GPT-style) on the One Billion Word Benchmark with masked and conditional next-token prediction. Metrics: development/test perplexity, number of seeds achieving convergence (perplexity below threshold).

6. Empirical Results and Comparative Analysis

iPC demonstrates accelerated energy minimization on generative nets, requiring fewer iterations and outperforming backpropagation (BP) under full-batch regimes.

<table> <thead> <tr><th>Architecture</th><th>BP (%)</th><th>PC (%)</th><th>iPC (%)</th></tr> </thead> <tbody> <tr><td>MLP (MNIST)</td><td\>98.26±0.12</td><td\>98.55±0.14</td><td\>98.54±0.86</td></tr> <tr><td>MLP (Fashion-MNIST)</td><td\>88.54±0.64</td><td\>85.12±0.75</td><td\>89.13±0.86</td></tr> <tr><td>CNN (SVHN)</td><td\>95.35±1.53</td><td\>94.53±1.54</td><td\>96.45±1.04</td></tr> <tr><td>CNN (CIFAR-10)</td><td\>69.34±0.54</td><td\>70.84±0.64</td><td\>72.54±0.93</td></tr> <tr><td>AlexNet (CIFAR-10, no augment)</td><td\>75.64±0.64</td><td\>64.63±1.55</td><td\>72.42±0.53</td></tr> <tr><td>AlexNet (+ augment)</td><td\>83.12±0.97</td><td\>71.99±2.43</td><td\>80.11±0.44</td></tr> </tbody> </table>

For scaling experiments, iPC outperformed BP up to width multipliers of C10C \leq 10, with BP overtaking only under extreme over-parameterization. Under dataset corruptions and out-of-distribution shifts, iPC maintained lower expected calibration error (AdaECE: 0.05 for iPC vs. 0.12 for BP), resulting in more reliable uncertainty quantification.

In language modeling, iPC matched or outperformed BP in masked token prediction tasks (test ppl: 106±10.5 for iPC vs. 120±13.2 for BP), with perfect convergence rate (10/10), indicating pronounced algorithmic stability compared to classic PC.

7. Biological Plausibility and Neuroscience Connections

iPC achieves two fundamental attributes for “brain-like” learning:

  • Locality: All synaptic updates depend solely on pre- and post-synaptic activities.
  • Autonomy: No external control signal or inference/learning demarcation is required; updates are self-timed and asynchronous.

Recent biophysical modeling maps predictive coding error nodes to apical dendrites, alleviating prior concerns over biological realism. iPC preserves all local update rules, ensuring compatibility with dendritic predictive-coding microcircuits and eliminating the need for phase scheduling.

This synchronization between computational and biological learning principles, combined with stable convergence and empirical efficiency, positions iPC as a robust standard for neuroscience-inspired learning and neuromorphic algorithm design (Salvatori et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Incremental Predictive Coding (iPC).