Divide-and-Conquer Predictive Coding

Updated 1 February 2026

DCPC is a structured Bayesian inference method that uses graph-based factorization and local computations to perform efficient learning in hierarchical generative models.
The algorithm interleaves particle-based latent inference with local maximum-likelihood parameter updates, ensuring both prediction error minimization and convergence.
Empirical results on tasks like MNIST and CelebA demonstrate that DCPC achieves competitive performance while maintaining biological plausibility through localized updates.

Divide-and-Conquer Predictive Coding (DCPC) is a structured Bayesian inference algorithm designed for generative models with explicit graphical structure, enabling biologically plausible, fully local learning and inference. DCPC addresses the longstanding gap between predictive coding's theoretical appeal and its empirical performance in high-dimensional structured inference tasks by respecting posterior correlation structure and performing provable maximum-likelihood parameter updates, while relying exclusively on local computations between neighboring nodes (Sennesh et al., 2024).

1. Structured Generative Models and Model Architecture

DCPC operates on directed acyclic graphs (DAGs) comprising observed variables $x = \{x_i\}$ and latent variables $z = \{z_j\}$ , parameterized by $\theta$ . The joint density factors as

$p(x, z; \theta) = \prod_{v \in x \cup z} p(v \mid \mathrm{pa}(v); \theta_v),$

where $\mathrm{pa}(v)$ denotes the parents of node $v$ in the graph and each conditional $p(v \mid \mathrm{pa}(v); \theta_v)$ is locally parameterized. For hierarchical models, a typical example is the chain $z_1 \to z_2 \to x$ , with each node conditionally dependent on its DAG parents.

This explicit factorization enables DCPC to exploit model structure during inference and learning, distinguishing it from prior predictive coding variants that often assume factorized posteriors or disregard higher-order dependencies.

2. Variational Objective and Theoretical Foundations

The algorithm optimizes the evidence lower bound (ELBO), also known as variational free energy:

$\mathcal{L}(q, \theta) = \mathbb{E}_{q(z)}[\log p(x, z; \theta)] - \mathbb{E}_{q(z)}[\log q(z)],$

subject to $\mathcal{L}(q, \theta) \leq \log p(x; \theta)$ . DCPC employs a particle-based empirical variational distribution $q(z)$ with $K$ samples (particles), facilitating both accurate posterior approximation and practical gradient-based optimization.

3. Divide-and-Conquer Algorithmic Updates

DCPC introduces an interleaved two-loop procedure comprising a particle-based latent inference (inner loop) and local parameter learning (outer loop):

Inner Loop: Latent Inference

For each latent $z_j$ , DCPC updates particles via coordinate-wise Langevin sampling. The complete conditional for $z_j$ is

$p(z_j \mid x, z_{\setminus j}; \theta) \propto p(z_j \mid \mathrm{pa}(z_j); \theta_{z_j}) \prod_{c \in \mathrm{ch}(z_j)} p(c \mid \mathrm{pa}(c); \theta_c),$

leveraging both parent and child nodes to compute the local 'prediction error' (score)

$\epsilon_{z_j} = \nabla_{z_j} \log p(z_j \mid \mathrm{pa}(z_j)) + \sum_{c \in \mathrm{ch}(z_j)} \nabla_{z_j} \log p(c \mid \mathrm{pa}(c)).$

Each of $K$ particles is updated via a Langevin proposal:

$z_j^{\mathrm{new},(k)} \sim \mathcal{N}(z_j^{\mathrm{old},(k)} + \eta \epsilon_{z_j}^{(k)}, 2\eta I),$

followed by importance resampling with weights proportional to the ratio of model likelihoods and proposal densities. Repeated application yields approximate Gibbs samples from the conditional posterior.

Outer Loop: Parameter Learning

Parameter updates are performed by stochastic gradient descent on the negative ELBO,

$F(\theta) \approx -\frac{1}{K} \sum_k \log p(x, z^{(k)}; \theta) + \text{const},$

with local gradients:

$\nabla_{\theta_v} F = -\frac{1}{K} \sum_{k=1}^K \nabla_{\theta_v} \log p(v^{(k)} \mid \mathrm{pa}(v)^{(k)}; \theta_v).$

Each parameter vector $\theta_v$ is updated by local prediction errors arising from activity at $v$ and its neighbors only.

4. Local Computation, Prediction Errors, and Closed-Form Solutions

All inference and learning computations in DCPC are local to nodes and their neighbors:

Prediction Errors: For each latent $z$ , the local score is

$\epsilon_{z} = \nabla_{z} \log p(z|\mathrm{pa}(z); \theta_z) + \sum_{c \in \mathrm{ch}(z)} \nabla_z \log p(c|\mathrm{pa}(c); \theta_c).$

Precision (Gaussian Case): If $p(v|\mathrm{pa}(v))$ is Gaussian $\mathcal{N}(v; \mu_v(\mathrm{pa}(v); \theta_v), \tau_v^{-1}I)$ , the score reduces to $\tau (v - \mu)$ , where $\tau$ is the precision.
Closed-form Maximum Likelihood Updates: For linear-Gaussian nodes:

$p(z|\mathrm{pa}(z)) = \mathcal{N}(z; W \cdot \mathrm{pa}(z) + b, \Lambda^{-1}),$

maximum likelihood solutions for $W$ and $b$ are derived via empirical moments over particles:

$W = \left[ \sum_k \mathbb{E}[z^{(k)} \mathrm{pa}(z)^{(k)\top}] - b \sum_k \mathbb{E}[\mathrm{pa}(z)^{(k)\top}] \right] \left[ \sum_k \mathbb{E}[\mathrm{pa}(z)^{(k)} \mathrm{pa}(z)^{(k)\top}]\right]^{-1},$

$b = \frac{1}{K} \sum_k \left(\mathbb{E}[z^{(k)}] - W \mathbb{E}[\mathrm{pa}(z)^{(k)}] \right).$

5. Biological Plausibility: Locality and Minimal Global Coordination

DCPC provides a constructive proof of biological plausibility. All inference and parameter updates rely on local (pre- and post-synaptic) activity, prediction-error signals, and Hebbian-like weight adjustments. Complete-conditional sampling for each latent $z$ utilizes only the local score and Langevin proposals, with importance resampling, yielding approximate Gibbs samples. Parameter gradients similarly require no information beyond the immediately adjacent nodes. No global backpropagation or nonlocal gradient flow is required. This local structure is congruent with plausibility criteria from theoretical neuroscience, contrasting with the global error propagation in standard deep learning approaches.

6. Algorithmic Implementation and Pseudocode

A high-level summary of DCPC's algorithm follows, abstracted in Pyro-style pseudocode:

def DCPC_step(z_particles, θ, x, η, K):
    # 1. Inference sweep over latents
    for v in latent_nodes:
        # compute local prediction errors ε_v for each particle
        ε = [∇_v log p(v|pa(v);θ_v) + Σ_ch ∇_v log p(ch|pa(ch);θ_ch)
             for each of K particles]
        # propose new particles via Langevin
        proposals = [z_old + η*ε[k] + sqrt(2η)*NormalNoise()
                     for k,z_old in enumerate(z_particles[v])]
        # compute importance weights
        weights = [ p_complete_conditional(proposals[k]) /
                    N(proposals[k]; z_old+η*ε[k],2ηI)
                    for k,z_old in enumerate(z_particles[v])]
        # resample K new particles
        z_particles[v] = resample(proposals,weights)
    # 2. Parameter update
    grads = {θ_v:0 for v in x∪z}
    for k in range(K):
        for v in x∪z:
            grads[θ_v] += ∇_{θ_v} log p(v^{(k)}|pa(v)^{(k)};θ_v)
    for v in x∪z:
        θ_v += η * (grads[θ_v]/K)
    return z_particles, θ

In practical implementations using probabilistic programming frameworks such as Pyro, each latent variable is registered with a local guide that performs Langevin proposals and resampling as above.

7. Empirical Performance and Benchmarking

DCPC demonstrates competitive or superior performance to previous predictive coding and particle-based inference methods across a range of tasks:

Task	Baseline	Metric	DCPC	Baseline Value
DLGM on MNIST (K=4)	MCPC (K=4) [Oliviers et al. 2024]	NLL	102.5±0.01	144.6±0.7
		MSE	0.01±7.2e-6	8.29e-2±0.05e-2
CelebA Face Gen (64×64)	LPC [Zahid et al. 2024]	FID	96.0±0.3	≈120
PGD Head-to-Head (32×32)	PGD [Kuntz et al. 2023]	FID	89.6±0.6	100±2.7

DCPC outperforms or matches MCPC, LPC, and PGD in negative log-likelihood, mean squared error, and Fréchet Inception Distance (FID), while strictly operating via local computation and respecting posterior dependencies (Sennesh et al., 2024).

A plausible implication is that DCPC closes the gap between biological plausibility and state-of-the-art variational inference in structured models, potentially enabling new classes of scalable, locally-learned Bayesian neural architectures.

Markdown Report Issue Upgrade to Chat

References (1)

Divide-and-Conquer Predictive Coding: a structured Bayesian inference algorithm (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Divide-and-Conquer Predictive Coding (DCPC).