Papers
Topics
Authors
Recent
Search
2000 character limit reached

Divide-and-Conquer Predictive Coding

Updated 1 February 2026
  • DCPC is a structured Bayesian inference method that uses graph-based factorization and local computations to perform efficient learning in hierarchical generative models.
  • The algorithm interleaves particle-based latent inference with local maximum-likelihood parameter updates, ensuring both prediction error minimization and convergence.
  • Empirical results on tasks like MNIST and CelebA demonstrate that DCPC achieves competitive performance while maintaining biological plausibility through localized updates.

Divide-and-Conquer Predictive Coding (DCPC) is a structured Bayesian inference algorithm designed for generative models with explicit graphical structure, enabling biologically plausible, fully local learning and inference. DCPC addresses the longstanding gap between predictive coding's theoretical appeal and its empirical performance in high-dimensional structured inference tasks by respecting posterior correlation structure and performing provable maximum-likelihood parameter updates, while relying exclusively on local computations between neighboring nodes (Sennesh et al., 2024).

1. Structured Generative Models and Model Architecture

DCPC operates on directed acyclic graphs (DAGs) comprising observed variables x={xi}x = \{x_i\} and latent variables z={zj}z = \{z_j\}, parameterized by θ\theta. The joint density factors as

p(x,z;θ)=vxzp(vpa(v);θv),p(x, z; \theta) = \prod_{v \in x \cup z} p(v \mid \mathrm{pa}(v); \theta_v),

where pa(v)\mathrm{pa}(v) denotes the parents of node vv in the graph and each conditional p(vpa(v);θv)p(v \mid \mathrm{pa}(v); \theta_v) is locally parameterized. For hierarchical models, a typical example is the chain z1z2xz_1 \to z_2 \to x, with each node conditionally dependent on its DAG parents.

This explicit factorization enables DCPC to exploit model structure during inference and learning, distinguishing it from prior predictive coding variants that often assume factorized posteriors or disregard higher-order dependencies.

2. Variational Objective and Theoretical Foundations

The algorithm optimizes the evidence lower bound (ELBO), also known as variational free energy:

L(q,θ)=Eq(z)[logp(x,z;θ)]Eq(z)[logq(z)],\mathcal{L}(q, \theta) = \mathbb{E}_{q(z)}[\log p(x, z; \theta)] - \mathbb{E}_{q(z)}[\log q(z)],

subject to L(q,θ)logp(x;θ)\mathcal{L}(q, \theta) \leq \log p(x; \theta). DCPC employs a particle-based empirical variational distribution q(z)q(z) with KK samples (particles), facilitating both accurate posterior approximation and practical gradient-based optimization.

3. Divide-and-Conquer Algorithmic Updates

DCPC introduces an interleaved two-loop procedure comprising a particle-based latent inference (inner loop) and local parameter learning (outer loop):

Inner Loop: Latent Inference

  • For each latent zjz_j, DCPC updates particles via coordinate-wise Langevin sampling. The complete conditional for zjz_j is

p(zjx,zj;θ)p(zjpa(zj);θzj)cch(zj)p(cpa(c);θc),p(z_j \mid x, z_{\setminus j}; \theta) \propto p(z_j \mid \mathrm{pa}(z_j); \theta_{z_j}) \prod_{c \in \mathrm{ch}(z_j)} p(c \mid \mathrm{pa}(c); \theta_c),

leveraging both parent and child nodes to compute the local 'prediction error' (score)

ϵzj=zjlogp(zjpa(zj))+cch(zj)zjlogp(cpa(c)).\epsilon_{z_j} = \nabla_{z_j} \log p(z_j \mid \mathrm{pa}(z_j)) + \sum_{c \in \mathrm{ch}(z_j)} \nabla_{z_j} \log p(c \mid \mathrm{pa}(c)).

Each of KK particles is updated via a Langevin proposal:

zjnew,(k)N(zjold,(k)+ηϵzj(k),2ηI),z_j^{\mathrm{new},(k)} \sim \mathcal{N}(z_j^{\mathrm{old},(k)} + \eta \epsilon_{z_j}^{(k)}, 2\eta I),

followed by importance resampling with weights proportional to the ratio of model likelihoods and proposal densities. Repeated application yields approximate Gibbs samples from the conditional posterior.

Outer Loop: Parameter Learning

F(θ)1Kklogp(x,z(k);θ)+const,F(\theta) \approx -\frac{1}{K} \sum_k \log p(x, z^{(k)}; \theta) + \text{const},

with local gradients:

θvF=1Kk=1Kθvlogp(v(k)pa(v)(k);θv).\nabla_{\theta_v} F = -\frac{1}{K} \sum_{k=1}^K \nabla_{\theta_v} \log p(v^{(k)} \mid \mathrm{pa}(v)^{(k)}; \theta_v).

Each parameter vector θv\theta_v is updated by local prediction errors arising from activity at vv and its neighbors only.

4. Local Computation, Prediction Errors, and Closed-Form Solutions

All inference and learning computations in DCPC are local to nodes and their neighbors:

  • Prediction Errors: For each latent zz, the local score is

ϵz=zlogp(zpa(z);θz)+cch(z)zlogp(cpa(c);θc).\epsilon_{z} = \nabla_{z} \log p(z|\mathrm{pa}(z); \theta_z) + \sum_{c \in \mathrm{ch}(z)} \nabla_z \log p(c|\mathrm{pa}(c); \theta_c).

  • Precision (Gaussian Case): If p(vpa(v))p(v|\mathrm{pa}(v)) is Gaussian N(v;μv(pa(v);θv),τv1I)\mathcal{N}(v; \mu_v(\mathrm{pa}(v); \theta_v), \tau_v^{-1}I), the score reduces to τ(vμ)\tau (v - \mu), where τ\tau is the precision.
  • Closed-form Maximum Likelihood Updates: For linear-Gaussian nodes:

p(zpa(z))=N(z;Wpa(z)+b,Λ1),p(z|\mathrm{pa}(z)) = \mathcal{N}(z; W \cdot \mathrm{pa}(z) + b, \Lambda^{-1}),

maximum likelihood solutions for WW and bb are derived via empirical moments over particles:

W=[kE[z(k)pa(z)(k)]bkE[pa(z)(k)]][kE[pa(z)(k)pa(z)(k)]]1,W = \left[ \sum_k \mathbb{E}[z^{(k)} \mathrm{pa}(z)^{(k)\top}] - b \sum_k \mathbb{E}[\mathrm{pa}(z)^{(k)\top}] \right] \left[ \sum_k \mathbb{E}[\mathrm{pa}(z)^{(k)} \mathrm{pa}(z)^{(k)\top}]\right]^{-1},

b=1Kk(E[z(k)]WE[pa(z)(k)]).b = \frac{1}{K} \sum_k \left(\mathbb{E}[z^{(k)}] - W \mathbb{E}[\mathrm{pa}(z)^{(k)}] \right).

5. Biological Plausibility: Locality and Minimal Global Coordination

DCPC provides a constructive proof of biological plausibility. All inference and parameter updates rely on local (pre- and post-synaptic) activity, prediction-error signals, and Hebbian-like weight adjustments. Complete-conditional sampling for each latent zz utilizes only the local score and Langevin proposals, with importance resampling, yielding approximate Gibbs samples. Parameter gradients similarly require no information beyond the immediately adjacent nodes. No global backpropagation or nonlocal gradient flow is required. This local structure is congruent with plausibility criteria from theoretical neuroscience, contrasting with the global error propagation in standard deep learning approaches.

6. Algorithmic Implementation and Pseudocode

A high-level summary of DCPC's algorithm follows, abstracted in Pyro-style pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def DCPC_step(z_particles, θ, x, η, K):
    # 1. Inference sweep over latents
    for v in latent_nodes:
        # compute local prediction errors ε_v for each particle
        ε = [_v log p(v|pa(v);θ_v) + Σ_ch _v log p(ch|pa(ch);θ_ch)
             for each of K particles]
        # propose new particles via Langevin
        proposals = [z_old + η*ε[k] + sqrt(2η)*NormalNoise()
                     for k,z_old in enumerate(z_particles[v])]
        # compute importance weights
        weights = [ p_complete_conditional(proposals[k]) /
                    N(proposals[k]; z_old+η*ε[k],2ηI)
                    for k,z_old in enumerate(z_particles[v])]
        # resample K new particles
        z_particles[v] = resample(proposals,weights)
    # 2. Parameter update
    grads = {θ_v:0 for v in xz}
    for k in range(K):
        for v in xz:
            grads[θ_v] += _{θ_v} log p(v^{(k)}|pa(v)^{(k)};θ_v)
    for v in xz:
        θ_v += η * (grads[θ_v]/K)
    return z_particles, θ
In practical implementations using probabilistic programming frameworks such as Pyro, each latent variable is registered with a local guide that performs Langevin proposals and resampling as above.

7. Empirical Performance and Benchmarking

DCPC demonstrates competitive or superior performance to previous predictive coding and particle-based inference methods across a range of tasks:

Task Baseline Metric DCPC Baseline Value
DLGM on MNIST (K=4) MCPC (K=4) [Oliviers et al. 2024] NLL 102.5±0.01 144.6±0.7
MSE 0.01±7.2e-6 8.29e-2±0.05e-2
CelebA Face Gen (64×64) LPC [Zahid et al. 2024] FID 96.0±0.3 ≈120
PGD Head-to-Head (32×32) PGD [Kuntz et al. 2023] FID 89.6±0.6 100±2.7

DCPC outperforms or matches MCPC, LPC, and PGD in negative log-likelihood, mean squared error, and Fréchet Inception Distance (FID), while strictly operating via local computation and respecting posterior dependencies (Sennesh et al., 2024).

A plausible implication is that DCPC closes the gap between biological plausibility and state-of-the-art variational inference in structured models, potentially enabling new classes of scalable, locally-learned Bayesian neural architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Divide-and-Conquer Predictive Coding (DCPC).