Divide-and-Conquer Predictive Coding: a structured Bayesian inference algorithm (2408.05834v2)

Published 11 Aug 2024 in stat.ML, cs.AI, cs.LG, and q-bio.NC

Abstract: Unexpected stimuli induce "error" or "surprise" signals in the brain. The theory of predictive coding promises to explain these observations in terms of Bayesian inference by suggesting that the cortex implements variational inference in a probabilistic graphical model. However, when applied to machine learning tasks, this family of algorithms has yet to perform on par with other variational approaches in high-dimensional, structured inference problems. To address this, we introduce a novel predictive coding algorithm for structured generative models, that we call divide-and-conquer predictive coding (DCPC). DCPC differs from other formulations of predictive coding, as it respects the correlation structure of the generative model and provably performs maximum-likelihood updates of model parameters, all without sacrificing biological plausibility. Empirically, DCPC achieves better numerical performance than competing algorithms and provides accurate inference in a number of problems not previously addressed with predictive coding. We provide an open implementation of DCPC in Pyro on Github.

Summary

The paper introduces DCPC, a novel algorithm that refines predictive coding with a divide-and-conquer approach for structured Bayesian inference.
It overcomes limitations of traditional Gaussian models by using Monte Carlo sampling and local updates to capture complex, multimodal data distributions.
Empirical results on datasets like MNIST and CelebA demonstrate lower NLL, MSE, and FID scores, highlighting DCPC’s effectiveness and biological plausibility.

Divide-and-Conquer Predictive Coding: A Structured Bayesian Inference Algorithm

This paper introduces Divide-and-Conquer Predictive Coding (DCPC), a novel algorithm designed to address structured variational inference problems in machine learning, an area where traditional predictive coding methods have previously struggled. The paper is a substantial contribution as it revisits and refines the core principles of predictive coding by incorporating more sophisticated sampling mechanisms and leveraging the structural properties of probabilistic graphical models.

Key Innovations and Algorithmic Details

The DCPC algorithm departs from classical predictive coding by introducing several important innovations that enhance its applicability and performance in high-dimensional and complex generative models:

Structured Generative Models: One salient issue with previous predictive coding approaches is their limitation to Gaussian generative models. DCPC overcomes this by employing Monte Carlo samples to represent complex, multimodal distributions that better capture the correlations and dependencies inherent in the data.
Local Prediction Errors: The novelty of DCPC lies in its ability to respect the correlation structure of the generative models. The algorithm achieves this by breaking down the inference process into local updates for individual random variables. This "divide-and-conquer" strategy allows DCPC to perform maximum-likelihood updates for model parameters locally, ensuring biological plausibility while enhancing performance.
Integration with Sequential Monte Carlo (SMC): SMC methods are utilized to handle the divide-and-conquer updates effectively. DCPC uses coordinate updates informed by unadjusted Langevin proposals parameterized by prediction errors, which are gradients of the complete conditional log-likelihoods. This structure ensures that even though updates are local, they collectively approximate the joint posterior distribution accurately.
Open Implementation: The authors provide an implementation of DCPC in Pyro, a deep probabilistic programming language, which ensures reproducibility and allows for wider adoption and further experimentation by the research community.

Numerical Evaluation and Empirical Results

DCPC was evaluated on several benchmark datasets and compared with other variational inference algorithms, prominently against Monte Carlo Predictive Coding (MCPC) and Langevin Predictive Coding (LPC). The results demonstrate DCPC's superior performance:

MNIST and Fashion MNIST: On these datasets, DCPC achieved lower negative log-likelihood (NLL) and mean squared error (MSE) compared to MCPC. Specifically, for MNIST, the NLL was reduced to 102.5 from 144.6, and the MSE was significantly lower at $0.01 \pm 7.2 \times 10^{-6}$ .
CelebA Dataset: In more complex settings involving image generation with representation learning, DCPC provided better sample quality as evidenced by lower Frechet Inception Distance (FID) scores compared to LPC, highlighting its capability in modeling complex data distributions effectively.

Theoretical Contributions and Biological Plausibility

The theoretical groundwork for DCPC is well-established in the paper through several key proofs:

Local Coordinate Updates: Theoretical analysis confirms that the local coordinate updates performed by DCPC are equivalent to sampling from the true complete conditional distributions, a critical property for maintaining the accuracy of inferred posterior distributions.
Parameter Updates: The derivation showing that the gradient of the variational free energy factorizes into local gradients emphasizes that parameter learning requires only local computations, aligning DCPC with the principles of biological plausibility.
Extension to Discrete Spaces: The paper also sketches a mathematical foundation for extending DCPC to discrete random variables using finite differences and Newton’s series, indicating the broad applicability of the approach.

Future Implications

DCPC sets the stage for several promising future developments in AI and machine learning:

Scalability and Efficiency: Future work could explore the use of momentum-based preconditioning and optimize the learning rate schedules to enhance the scalability and efficiency of DCPC further.
Extension to Other Model Architectures: Given its flexibility, DCPC can be extended to other model architectures beyond those tested in the paper. Research could focus on integrating DCPC with more complex models, such as attention mechanisms in transformers.
Biological Models of Learning: DCPC’s adherence to local computations makes it a strong candidate for further exploration in neurocomputational models, potentially leading to new insights into how biological systems perform inference and learning.

In conclusion, DCPC represents a significant step forward in the field of predictive coding and variational inference. With its ability to handle complex, structured generative models efficiently using local computations, DCPC not only advances the state of machine learning algorithms but also offers a compelling framework for biologically plausible models of learning.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1825171155306856651

https://twitter.com/EliSennesh/status/1847074937103962440

https://twitter.com/John_W_Maki/status/1823454038186152396

YouTube

Show All Videos