Divide-and-Conquer Predictive Coding
- DCPC is a structured Bayesian inference method that uses graph-based factorization and local computations to perform efficient learning in hierarchical generative models.
- The algorithm interleaves particle-based latent inference with local maximum-likelihood parameter updates, ensuring both prediction error minimization and convergence.
- Empirical results on tasks like MNIST and CelebA demonstrate that DCPC achieves competitive performance while maintaining biological plausibility through localized updates.
Divide-and-Conquer Predictive Coding (DCPC) is a structured Bayesian inference algorithm designed for generative models with explicit graphical structure, enabling biologically plausible, fully local learning and inference. DCPC addresses the longstanding gap between predictive coding's theoretical appeal and its empirical performance in high-dimensional structured inference tasks by respecting posterior correlation structure and performing provable maximum-likelihood parameter updates, while relying exclusively on local computations between neighboring nodes (Sennesh et al., 2024).
1. Structured Generative Models and Model Architecture
DCPC operates on directed acyclic graphs (DAGs) comprising observed variables and latent variables , parameterized by . The joint density factors as
where denotes the parents of node in the graph and each conditional is locally parameterized. For hierarchical models, a typical example is the chain , with each node conditionally dependent on its DAG parents.
This explicit factorization enables DCPC to exploit model structure during inference and learning, distinguishing it from prior predictive coding variants that often assume factorized posteriors or disregard higher-order dependencies.
2. Variational Objective and Theoretical Foundations
The algorithm optimizes the evidence lower bound (ELBO), also known as variational free energy:
subject to . DCPC employs a particle-based empirical variational distribution with samples (particles), facilitating both accurate posterior approximation and practical gradient-based optimization.
3. Divide-and-Conquer Algorithmic Updates
DCPC introduces an interleaved two-loop procedure comprising a particle-based latent inference (inner loop) and local parameter learning (outer loop):
Inner Loop: Latent Inference
- For each latent , DCPC updates particles via coordinate-wise Langevin sampling. The complete conditional for is
leveraging both parent and child nodes to compute the local 'prediction error' (score)
Each of particles is updated via a Langevin proposal:
followed by importance resampling with weights proportional to the ratio of model likelihoods and proposal densities. Repeated application yields approximate Gibbs samples from the conditional posterior.
Outer Loop: Parameter Learning
- Parameter updates are performed by stochastic gradient descent on the negative ELBO,
with local gradients:
Each parameter vector is updated by local prediction errors arising from activity at and its neighbors only.
4. Local Computation, Prediction Errors, and Closed-Form Solutions
All inference and learning computations in DCPC are local to nodes and their neighbors:
- Prediction Errors: For each latent , the local score is
- Precision (Gaussian Case): If is Gaussian , the score reduces to , where is the precision.
- Closed-form Maximum Likelihood Updates: For linear-Gaussian nodes:
maximum likelihood solutions for and are derived via empirical moments over particles:
5. Biological Plausibility: Locality and Minimal Global Coordination
DCPC provides a constructive proof of biological plausibility. All inference and parameter updates rely on local (pre- and post-synaptic) activity, prediction-error signals, and Hebbian-like weight adjustments. Complete-conditional sampling for each latent utilizes only the local score and Langevin proposals, with importance resampling, yielding approximate Gibbs samples. Parameter gradients similarly require no information beyond the immediately adjacent nodes. No global backpropagation or nonlocal gradient flow is required. This local structure is congruent with plausibility criteria from theoretical neuroscience, contrasting with the global error propagation in standard deep learning approaches.
6. Algorithmic Implementation and Pseudocode
A high-level summary of DCPC's algorithm follows, abstracted in Pyro-style pseudocode:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
def DCPC_step(z_particles, θ, x, η, K): # 1. Inference sweep over latents for v in latent_nodes: # compute local prediction errors ε_v for each particle ε = [∇_v log p(v|pa(v);θ_v) + Σ_ch ∇_v log p(ch|pa(ch);θ_ch) for each of K particles] # propose new particles via Langevin proposals = [z_old + η*ε[k] + sqrt(2η)*NormalNoise() for k,z_old in enumerate(z_particles[v])] # compute importance weights weights = [ p_complete_conditional(proposals[k]) / N(proposals[k]; z_old+η*ε[k],2ηI) for k,z_old in enumerate(z_particles[v])] # resample K new particles z_particles[v] = resample(proposals,weights) # 2. Parameter update grads = {θ_v:0 for v in x∪z} for k in range(K): for v in x∪z: grads[θ_v] += ∇_{θ_v} log p(v^{(k)}|pa(v)^{(k)};θ_v) for v in x∪z: θ_v += η * (grads[θ_v]/K) return z_particles, θ |
7. Empirical Performance and Benchmarking
DCPC demonstrates competitive or superior performance to previous predictive coding and particle-based inference methods across a range of tasks:
| Task | Baseline | Metric | DCPC | Baseline Value |
|---|---|---|---|---|
| DLGM on MNIST (K=4) | MCPC (K=4) [Oliviers et al. 2024] | NLL | 102.5±0.01 | 144.6±0.7 |
| MSE | 0.01±7.2e-6 | 8.29e-2±0.05e-2 | ||
| CelebA Face Gen (64×64) | LPC [Zahid et al. 2024] | FID | 96.0±0.3 | ≈120 |
| PGD Head-to-Head (32×32) | PGD [Kuntz et al. 2023] | FID | 89.6±0.6 | 100±2.7 |
DCPC outperforms or matches MCPC, LPC, and PGD in negative log-likelihood, mean squared error, and Fréchet Inception Distance (FID), while strictly operating via local computation and respecting posterior dependencies (Sennesh et al., 2024).
A plausible implication is that DCPC closes the gap between biological plausibility and state-of-the-art variational inference in structured models, potentially enabling new classes of scalable, locally-learned Bayesian neural architectures.