Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Brain-like variational inference (2410.19315v2)

Published 25 Oct 2024 in cs.AI, cs.LG, and q-bio.NC

Abstract: Inference in both brains and machines can be formalized by optimizing a shared objective: maximizing the evidence lower bound (ELBO) in machine learning, or minimizing variational free energy (F) in neuroscience (ELBO = -F). While this equivalence suggests a unifying framework, it leaves open how inference is implemented in neural systems. Here, we show that online natural gradient descent on F, under Poisson assumptions, leads to a recurrent spiking neural network that performs variational inference via membrane potential dynamics. The resulting model -- the iterative Poisson variational autoencoder (iP-VAE) -- replaces the encoder network with local updates derived from natural gradient descent on F. Theoretically, iP-VAE yields a number of desirable features such as emergent normalization via lateral competition, and hardware-efficient integer spike count representations. Empirically, iP-VAE outperforms both standard VAEs and Gaussian-based predictive coding models in sparsity, reconstruction, and biological plausibility. iP-VAE also exhibits strong generalization to out-of-distribution inputs, exceeding hybrid iterative-amortized VAEs. These results demonstrate how deriving inference algorithms from first principles can yield concrete architectures that are simultaneously biologically plausible and empirically effective.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (122)
  1. With or without you: predictive coding and bayesian inference in the brain. Current opinion in neurobiology, 46:219–227, 2017.
  2. Alhazen. Book of optics (Kitab Al-Manazir). 1011–1021 AD.
  3. An evaluation of causes for unreliability of synaptic transmission. Proceedings of the National Academy of Sciences, 91(22):10380–10383, 1994.
  4. Mel Andrews. The math is not the territory: navigating the free energy principle. Biology & Philosophy, 36(3):30, 2021.
  5. Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29, 2016.
  6. Neural correlations, population coding and computation. Nature reviews neuroscience, 7(5):358–366, 2006.
  7. Using fast weights to attend to the recent past. Advances in neural information processing systems, 29, 2016.
  8. Deep equilibrium models. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/01386bd6d8e091c2ab4c7c7de644d37b-Paper.pdf.
  9. Multiscale deep equilibrium models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  5238–5250. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/3812f9a59b634c2a9c574610eaba5bed-Paper.pdf.
  10. Cold diffusion: Inverting arbitrary image transforms without noise, 2022.
  11. Canonical microcircuits for predictive coding. Neuron, 76(4):695–711, 2012. doi: 10.1016/j.neuron.2012.10.038.
  12. Pattern recognition and machine learning, volume 4. Springer, 2006.
  13. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
  14. Predictive coding of dynamical variables in balanced spiking networks. PLoS computational biology, 9(11):e1003258, 2013.
  15. Iterative vae as a predictive brain model for out-of-distribution generalization. arXiv preprint arXiv:2012.00557, 2020.
  16. Nonlinear computations shaping temporal processing of precortical vision. Journal of Neurophysiology, 116(3):1344–1357, 2016.
  17. Synaptic noise and other sources of randomness in motoneuron interspike intervals. Journal of neurophysiology, 31(4):574–587, 1968.
  18. Matteo Carandini. Amplification of trial-to-trial response variability by neurons in visual cortex. PLoS biology, 2(9):e264, 2004.
  19. Stanley H. Chan. Tutorial on diffusion models for imaging and vision. 2024. URL https://arxiv.org/abs/2403.18103.
  20. Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=-5rFUTO2NWe.
  21. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  22. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.
  23. A recurrent latent variable model for sequential data. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/b618c3210e934362ac261db280128c22-Paper.pdf.
  24. Emnist: an extension of mnist to handwritten letters. arXiv preprint arXiv:1702.05373, 2017.
  25. Inference suboptimality in variational autoencoders. In International Conference on Machine Learning, pp.  1078–1086. PMLR, 2018.
  26. Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT press, 2005.
  27. The helmholtz machine. Neural Computation, 7(5):889–904, 1995. doi: 10.1162/neco.1995.7.5.889.
  28. AF Dean. The variability of discharge of simple cells in the cat striate cortex. Experimental Brain Research, 44(4):437–440, 1981.
  29. Inversion by direct iteration: An alternative to denoising diffusion for image restoration, 2024. URL https://arxiv.org/abs/2303.11435.
  30. A generalized spiking locally competitive algorithm for multiple optimization problems. arXiv preprint arXiv:2407.03930, 2024.
  31. Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems, 32, 2019.
  32. Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp.  1126–1135. PMLR, 2017.
  33. A polar prediction model for learning to represent visual transformations. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=hyPUZX03Ks.
  34. How spike generation mechanisms determine the neuronal response to fluctuating inputs. Journal of neuroscience, 23(37):11628–11640, 2003.
  35. Karl Friston. A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological Sciences, 360(1456):815–836, 2005. doi: 10.1098/rstb.2005.1622.
  36. Karl Friston. The free-energy principle: a rough guide to the brain? Trends in cognitive sciences, 13(7):293–301, 2009.
  37. Karl Friston. The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2):127–138, 2010. doi: 10.1038/nrn2787.
  38. Amortized inference in probabilistic reasoning. In Proceedings of the annual meeting of the cognitive science society, volume 36, 2014. URL https://escholarship.org/uc/item/34j1h7k5.
  39. Samuel J Gershman. What does the free energy principle tell us about the brain? arXiv preprint arXiv:1901.07945, 2019.
  40. Charles D Gilbert and Wu Li. Top-down influences on visual processing. Nature Reviews Neuroscience, 14(5):350–363, 2013.
  41. Partitioning neuronal variability. Nature neuroscience, 17(6):858–865, 2014.
  42. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
  43. Using fast weights to deblur old memories. In Proceedings of the ninth annual conference of the Cognitive Science Society, pp.  177–186, 1987.
  44. The” wake-sleep” algorithm for unsupervised neural networks. Science, 268(5214):1158–1161, 1995.
  45. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  46. Arthur Hobson. A new theorem of information theory. Journal of Statistical Physics, 1:383–391, 1969.
  47. Stochastic variational inference. Journal of Machine Learning Research, 2013.
  48. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(89)90020-8. URL https://www.sciencedirect.com/science/article/pii/0893608089900208.
  49. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
  50. Going beyond linear transformers with recurrent fast weight programmers. Advances in neural information processing systems, 34:7703–7717, 2021.
  51. Generalization in diffusion models arises from geometry-adaptive harmonic representations. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ANvmVS2Yr0.
  52. R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1):35–45, 03 1960. ISSN 0021-9223. doi: 10.1115/1.3662552. URL https://doi.org/10.1115/1.3662552.
  53. Principles of neural science, volume 4. McGraw-hill New York, 2000.
  54. Predictive processing: a canonical cortical computation. Neuron, 100(2):424–435, 2018.
  55. Reducing the amortization gap in variational autoencoders: A bayesian random function approach. arXiv preprint arXiv:2102.03151, 2021.
  56. Semi-amortized variational autoencoders. In International Conference on Machine Learning, pp.  2678–2687. PMLR, 2018.
  57. Understanding diffusion objectives as the ELBO with simple data augmentation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=NnMEadcdyD.
  58. Auto-encoding variational bayes. 2014.
  59. Is predictive coding theory articulated enough to be testable?, 2015.
  60. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
  61. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  62. A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
  63. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  64. Deep learning. Nature, 521(7553):436–444, 2015. doi: 10.1038/nature14539.
  65. Sparse deep belief net model for visual area v2. Advances in neural information processing systems, 20, 2007.
  66. Hierarchical bayesian inference in the visual cortex. JOSA A, 20(7):1434–1448, 2003. doi: 10.1364/JOSAA.20.001434.
  67. Object-centric learning with slot attention. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  11525–11538. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/8511df98c02ab60aea1b2356c013bc0f-Paper.pdf.
  68. Deep predictive coding networks for video prediction and unsupervised learning. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=B1ewdt9xe.
  69. Calvin Luo. Understanding diffusion models: A unified perspective. arxiv 2022. arXiv preprint arXiv:2208.11970, 2022.
  70. Comparison of backpropagation and kalman filter-based training for neural networks. In 2021 25th International Conference on System Theory, Control and Computing (ICSTCC), pp.  234–241, 2021. doi: 10.1109/ICSTCC52150.2021.9607274.
  71. Reliability of spike timing in neocortical neurons. Science, 268(5216):1503–1506, 1995.
  72. Iterative amortized inference. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  3403–3412. PMLR, 7 2018. URL https://proceedings.mlr.press/v80/marino18a.html.
  73. Joseph Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34(1):1–44, 2022. doi: 10.1162/neco˙a˙01458.
  74. Iterative amortized policy optimization. Advances in Neural Information Processing Systems, 34:15667–15681, 2021.
  75. Where is the error? hierarchical predictive coding through dendritic error computation. Trends in Neurosciences, 46(1):45–59, 2023.
  76. Predictive coding: a theoretical and experimental review. CoRR, abs/2107.12979, 2021a. URL https://arxiv.org/abs/2107.12979.
  77. Neural kalman filtering, 2021b. URL https://arxiv.org/abs/2102.10021.
  78. Predictive coding: Towards a future of deep learning beyond backpropagation? In International Joint Conference on Artificial Intelligence, 2022. doi: 10.24963/ijcai.2022/774.
  79. Predictive coding networks for temporal prediction. PLOS Computational Biology, 20(4):e1011183, 2024.
  80. Adaptive denoising via gaintuning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  23727–23740. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/c7558e9d1f956b016d1fdba7ea132378-Paper.pdf.
  81. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609, 1996. doi: 10.1038/381607a0.
  82. Sparse coding of sensory inputs. Current opinion in neurobiology, 14(4):481–487, 2004. doi: 10.1016/j.conb.2004.07.007.
  83. The contribution of spike threshold to the dichotomy of cortical simple and complex cells. Nature neuroscience, 7(10):1113–1122, 2004. doi: 10.1038/nn1310.
  84. Self2self with dropout: Learning self-supervised denoising from single image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  85. On bayesian mechanics: a physics of and by beliefs. Interface Focus, 13(3):20220029, 2023.
  86. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87, 1999. doi: 10.1038/4580.
  87. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pp.  1278–1286. PMLR, 2014. URL https://proceedings.mlr.press/v32/rezende14.html.
  88. Spikes: exploring the neural code. MIT press, 1999.
  89. Sparse coding via thresholding and local competition in neural circuits. Neural Computation, 20(10):2526–2563, 2008. doi: 10.1162/neco.2008.03-07-486.
  90. Artificial intelligence: a modern approach. Pearson, 2016.
  91. Jürgen Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992.
  92. Opportunities for neuromorphic computing algorithms and applications. Nature Computational Science, 2022. doi: 10.1038/s43588-022-00223-2.
  93. Terrence J Sejnowski. The unreasonable effectiveness of deep learning in artificial intelligence. Proceedings of the National Academy of Sciences, 117(48):30033–30038, 2020.
  94. The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. Journal of neuroscience, 18(10):3870–3896, 1998.
  95. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR, 2015.
  96. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  97. Predictive coding: a fresh view of inhibition in the retina. Proceedings of the Royal Society of London. Series B. Biological Sciences, 216(1205):427–459, 1982. doi: 10.1098/rspb.1982.0085.
  98. Test-time training with self-supervision for generalization under distribution shifts. In International conference on machine learning, pp.  9229–9248. PMLR, 2020.
  99. Learning to (learn at test time): Rnns with expressive hidden states, 2024. URL https://arxiv.org/abs/2407.04620.
  100. A review of learning in biologically plausible spiking neural networks. Neural Networks, 122:253–272, 2020. ISSN 0893-6080. doi: 10.1016/j.neunet.2019.09.036.
  101. Malvin C Teich. Fractal character of the auditory neural spike train. IEEE Transactions on Biomedical Engineering, 36(1):150–160, 1989.
  102. Michael Teti. Lca-pytorch. [Computer Software] https://doi.org/10.11578/dc.20230728.4, jun 2023. URL https://doi.org/10.11578/dc.20230728.4.
  103. The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision research, 23(8):775–785, 1983.
  104. Informative neural ensemble kalman learning, 2020. URL https://arxiv.org/abs/2008.09915.
  105. A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of neurophysiology, 93(2):1074–1089, 2005.
  106. Hierarchical VAEs provide a normative account of motion processing in the primate brain. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=1wOkHN9JK8.
  107. Poisson variational autoencoder. 2024. URL https://arxiv.org/abs/2405.14473.
  108. J Hans Van Hateren and Arjen van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1394):359–366, 1998.
  109. Nicolaas Godfried Van Kampen. Stochastic processes in physics and chemistry, volume 1. Elsevier, 1992.
  110. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
  111. Hermann Von Helmholtz. Handbuch der physiologischen Optik, volume 9. Voss, 1867. URL https://archive.org/details/handbuchderphysi00helm.
  112. Evaluating the neurophysiological evidence for predictive processing as a model of perception. Annals of the new York Academy of Sciences, 1464(1):242–268, 2020.
  113. Capturing the dynamical repertoire of single neurons with generalized linear models. Neural computation, 29(12):3260–3289, 2017.
  114. Deep predictive coding network for object recognition. In International conference on machine learning, pp.  5266–5275. PMLR, 2018.
  115. B. Widrow. Adaptive ”adaline” Neuron Using Chemical ”memistors.”. 1960. URL https://books.google.com/books?id=Yc4EAAAAIAAJ.
  116. Adaptive Signal Processing. Prentice-Hall PTR, 1985.
  117. A neural implementation of the kalman filter. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (eds.), Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009. URL https://proceedings.neurips.cc/paper_files/paper/2009/file/6d0f846348a856321729a2f36734d1a7-Paper.pdf.
  118. Diffusion models: A comprehensive survey of methods and applications, 2024. URL https://arxiv.org/abs/2209.00796.
  119. A survey on evaluation of out-of-distribution generalization. arXiv preprint arXiv:2403.01874, 2024.
  120. Vision as bayesian inference: analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006. doi: 10.1016/j.tics.2006.05.002.
  121. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022.
  122. A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of v1 simple cell receptive fields. PLoS computational biology, 7(10):e1002250, 2011.

Summary

  • The paper’s main contribution is bridging the Free Energy Principle with ELBO maximization to construct a brain-inspired model for iterative Bayesian inference.
  • It employs Poisson assumptions with recurrent updates that mimic spiking neural dynamics, enhancing adaptiveness and generalization over traditional models.
  • Empirical results show improved reconstruction accuracy and efficient transfer learning, highlighting its potential for neuromorphic and energy-efficient AI systems.

An Evaluation of a Brain-Inspired Framework for Iterative Inference in NeuroAI

The paper "A prescriptive theory for brain-like inference" proposes an innovative approach to constructing a brain-inspired neural network that employs iterative inference to perform Bayesian posterior updates. This theory integrates neuroscience principles with machine learning techniques, achieving enhanced adaptiveness and generalization capabilities over traditional neural models by optimizing the Evidence Lower Bound (ELBO) in a spiking neural network.

Overview and Theoretical Foundations

The paper underlines two intertwined components: the Free Energy Principle (FEP), a theoretical framework in neuroscience, and ELBO maximization, widely used in machine learning. The researchers bridge these components to derive a model reflective of spiking neural networks capable of performing robust Bayesian inference through membrane potential dynamics.

Key to this approach is the adoption of Poisson distributions over Gaussian assumptions, resulting in the iterative Poisson Variational Autoencoder (i).Thischoicealignswithempiricalfindingsinneuroscience,wherebiologicalneuronsareconditionallyPoisson.ThiscontrastssharpGaussianassumptions,whichoftencreatemismatcheswhenmappingMLmodelsontoneuralcircuits.</p><h3class=paperheadingid=modelandalgorithmicimplementation>ModelandAlgorithmicImplementation</h3><p>Thei). This choice aligns with empirical findings in neuroscience, where biological neurons are conditionally Poisson. This contrasts sharp Gaussian assumptions, which often create mismatches when mapping ML models onto neural circuits.</p> <h3 class='paper-heading' id='model-and-algorithmic-implementation'>Model and Algorithmic Implementation</h3> <p>The i model extends the Poisson VAE framework into an iterative design, accommodating sequence data under Poisson assumptions. Methodologically, it completely abandons the encoder used in VAEs, instead using a recurrent update rule combining the ELBO objective with neural dynamics inspired by biological neurons.

Through iterative updates on log rates u(t)\bm{u}(t), the framework naturally aligns with the membrane potential dynamics of spiking neurons. The updates effectively embody a predictive coding scheme enhanced with Bayesian updates for reconstructing the input sequence data. This approach elegantly avoids several issues associated with earlier predictive coding models, notably by ensuring positivity in firing rates and avoiding explicit prediction terms.

Empirical Results and Generalization

The empirical evaluations demonstrate the iterative i$ model&#39;s effectiveness across several key metrics. It achieves superior reconstruction accuracy, adaptability, and generalizes well to out-of-distribution (OOD) conditions compared to existing amortized models and iterative alternatives like ia-VAE and sa-VAE.</p> <p>Significantly, i$ closes the performance gap identified in preceding work comparing Poisson variants and traditional sparse coding algorithms. The model's learned features are compositional, enhancing its ability to generalize across different datasets and perturbations—for instance, handling rotated MNIST and transferring learning from MNIST to Omniglot efficiently without retraining.

Implications and Future Directions

The i$ framework ushers in an adaptable, energy-efficient model that resonates with the brain's well-documented neural computation methods. Its successful mapping onto biological principles could significantly impact developments in neuromorphic hardware and energy-efficient AI systems.

Future explorations could delve into hierarchical versions of this model or extend its capabilities to dynamic, non-stationary data, such as video sequences. Such enhancements could deepen our knowledge of neural-inspired computation in artificial systems, pushing the boundaries of unsupervised learning paradigms aligned with biological efficiency and complexity.

In conclusion, the paper solidifies the practical and theoretical importance of integrating Poisson-based ELBO optimization with iterative inference. This research not only introduces a compelling brain-inspired architecture but also provides a fertile ground for ongoing advancements in NeuroAI and machine learning's future role in decoding and emulating neural processing.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews