Brain-like variational inference (2410.19315v2)
Abstract: Inference in both brains and machines can be formalized by optimizing a shared objective: maximizing the evidence lower bound (ELBO) in machine learning, or minimizing variational free energy (F) in neuroscience (ELBO = -F). While this equivalence suggests a unifying framework, it leaves open how inference is implemented in neural systems. Here, we show that online natural gradient descent on F, under Poisson assumptions, leads to a recurrent spiking neural network that performs variational inference via membrane potential dynamics. The resulting model -- the iterative Poisson variational autoencoder (iP-VAE) -- replaces the encoder network with local updates derived from natural gradient descent on F. Theoretically, iP-VAE yields a number of desirable features such as emergent normalization via lateral competition, and hardware-efficient integer spike count representations. Empirically, iP-VAE outperforms both standard VAEs and Gaussian-based predictive coding models in sparsity, reconstruction, and biological plausibility. iP-VAE also exhibits strong generalization to out-of-distribution inputs, exceeding hybrid iterative-amortized VAEs. These results demonstrate how deriving inference algorithms from first principles can yield concrete architectures that are simultaneously biologically plausible and empirically effective.
- With or without you: predictive coding and bayesian inference in the brain. Current opinion in neurobiology, 46:219–227, 2017.
- Alhazen. Book of optics (Kitab Al-Manazir). 1011–1021 AD.
- An evaluation of causes for unreliability of synaptic transmission. Proceedings of the National Academy of Sciences, 91(22):10380–10383, 1994.
- Mel Andrews. The math is not the territory: navigating the free energy principle. Biology & Philosophy, 36(3):30, 2021.
- Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29, 2016.
- Neural correlations, population coding and computation. Nature reviews neuroscience, 7(5):358–366, 2006.
- Using fast weights to attend to the recent past. Advances in neural information processing systems, 29, 2016.
- Deep equilibrium models. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/01386bd6d8e091c2ab4c7c7de644d37b-Paper.pdf.
- Multiscale deep equilibrium models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 5238–5250. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/3812f9a59b634c2a9c574610eaba5bed-Paper.pdf.
- Cold diffusion: Inverting arbitrary image transforms without noise, 2022.
- Canonical microcircuits for predictive coding. Neuron, 76(4):695–711, 2012. doi: 10.1016/j.neuron.2012.10.038.
- Pattern recognition and machine learning, volume 4. Springer, 2006.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- Predictive coding of dynamical variables in balanced spiking networks. PLoS computational biology, 9(11):e1003258, 2013.
- Iterative vae as a predictive brain model for out-of-distribution generalization. arXiv preprint arXiv:2012.00557, 2020.
- Nonlinear computations shaping temporal processing of precortical vision. Journal of Neurophysiology, 116(3):1344–1357, 2016.
- Synaptic noise and other sources of randomness in motoneuron interspike intervals. Journal of neurophysiology, 31(4):574–587, 1968.
- Matteo Carandini. Amplification of trial-to-trial response variability by neurons in visual cortex. PLoS biology, 2(9):e264, 2004.
- Stanley H. Chan. Tutorial on diffusion models for imaging and vision. 2024. URL https://arxiv.org/abs/2403.18103.
- Object representations as fixed points: Training iterative refinement algorithms with implicit differentiation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=-5rFUTO2NWe.
- Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.
- A recurrent latent variable model for sequential data. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/paper_files/paper/2015/file/b618c3210e934362ac261db280128c22-Paper.pdf.
- Emnist: an extension of mnist to handwritten letters. arXiv preprint arXiv:1702.05373, 2017.
- Inference suboptimality in variational autoencoders. In International Conference on Machine Learning, pp. 1078–1086. PMLR, 2018.
- Theoretical neuroscience: computational and mathematical modeling of neural systems. MIT press, 2005.
- The helmholtz machine. Neural Computation, 7(5):889–904, 1995. doi: 10.1162/neco.1995.7.5.889.
- AF Dean. The variability of discharge of simple cells in the cat striate cortex. Experimental Brain Research, 44(4):437–440, 1981.
- Inversion by direct iteration: An alternative to denoising diffusion for image restoration, 2024. URL https://arxiv.org/abs/2303.11435.
- A generalized spiking locally competitive algorithm for multiple optimization problems. arXiv preprint arXiv:2407.03930, 2024.
- Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems, 32, 2019.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International conference on machine learning, pp. 1126–1135. PMLR, 2017.
- A polar prediction model for learning to represent visual transformations. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=hyPUZX03Ks.
- How spike generation mechanisms determine the neuronal response to fluctuating inputs. Journal of neuroscience, 23(37):11628–11640, 2003.
- Karl Friston. A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological Sciences, 360(1456):815–836, 2005. doi: 10.1098/rstb.2005.1622.
- Karl Friston. The free-energy principle: a rough guide to the brain? Trends in cognitive sciences, 13(7):293–301, 2009.
- Karl Friston. The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2):127–138, 2010. doi: 10.1038/nrn2787.
- Amortized inference in probabilistic reasoning. In Proceedings of the annual meeting of the cognitive science society, volume 36, 2014. URL https://escholarship.org/uc/item/34j1h7k5.
- Samuel J Gershman. What does the free energy principle tell us about the brain? arXiv preprint arXiv:1901.07945, 2019.
- Charles D Gilbert and Wu Li. Top-down influences on visual processing. Nature Reviews Neuroscience, 14(5):350–363, 2013.
- Partitioning neuronal variability. Nature neuroscience, 17(6):858–865, 2014.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- Using fast weights to deblur old memories. In Proceedings of the ninth annual conference of the Cognitive Science Society, pp. 177–186, 1987.
- The” wake-sleep” algorithm for unsupervised neural networks. Science, 268(5214):1158–1161, 1995.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Arthur Hobson. A new theorem of information theory. Journal of Statistical Physics, 1:383–391, 1969.
- Stochastic variational inference. Journal of Machine Learning Research, 2013.
- Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989. ISSN 0893-6080. doi: https://doi.org/10.1016/0893-6080(89)90020-8. URL https://www.sciencedirect.com/science/article/pii/0893608089900208.
- Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169, 2021.
- Going beyond linear transformers with recurrent fast weight programmers. Advances in neural information processing systems, 34:7703–7717, 2021.
- Generalization in diffusion models arises from geometry-adaptive harmonic representations. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=ANvmVS2Yr0.
- R. E. Kalman. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1):35–45, 03 1960. ISSN 0021-9223. doi: 10.1115/1.3662552. URL https://doi.org/10.1115/1.3662552.
- Principles of neural science, volume 4. McGraw-hill New York, 2000.
- Predictive processing: a canonical cortical computation. Neuron, 100(2):424–435, 2018.
- Reducing the amortization gap in variational autoencoders: A bayesian random function approach. arXiv preprint arXiv:2102.03151, 2021.
- Semi-amortized variational autoencoders. In International Conference on Machine Learning, pp. 2678–2687. PMLR, 2018.
- Understanding diffusion objectives as the ELBO with simple data augmentation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=NnMEadcdyD.
- Auto-encoding variational bayes. 2014.
- Is predictive coding theory articulated enough to be testable?, 2015.
- Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger (eds.), Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
- Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
- A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
- Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- Deep learning. Nature, 521(7553):436–444, 2015. doi: 10.1038/nature14539.
- Sparse deep belief net model for visual area v2. Advances in neural information processing systems, 20, 2007.
- Hierarchical bayesian inference in the visual cortex. JOSA A, 20(7):1434–1448, 2003. doi: 10.1364/JOSAA.20.001434.
- Object-centric learning with slot attention. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 11525–11538. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/8511df98c02ab60aea1b2356c013bc0f-Paper.pdf.
- Deep predictive coding networks for video prediction and unsupervised learning. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=B1ewdt9xe.
- Calvin Luo. Understanding diffusion models: A unified perspective. arxiv 2022. arXiv preprint arXiv:2208.11970, 2022.
- Comparison of backpropagation and kalman filter-based training for neural networks. In 2021 25th International Conference on System Theory, Control and Computing (ICSTCC), pp. 234–241, 2021. doi: 10.1109/ICSTCC52150.2021.9607274.
- Reliability of spike timing in neocortical neurons. Science, 268(5216):1503–1506, 1995.
- Iterative amortized inference. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 3403–3412. PMLR, 7 2018. URL https://proceedings.mlr.press/v80/marino18a.html.
- Joseph Marino. Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34(1):1–44, 2022. doi: 10.1162/neco˙a˙01458.
- Iterative amortized policy optimization. Advances in Neural Information Processing Systems, 34:15667–15681, 2021.
- Where is the error? hierarchical predictive coding through dendritic error computation. Trends in Neurosciences, 46(1):45–59, 2023.
- Predictive coding: a theoretical and experimental review. CoRR, abs/2107.12979, 2021a. URL https://arxiv.org/abs/2107.12979.
- Neural kalman filtering, 2021b. URL https://arxiv.org/abs/2102.10021.
- Predictive coding: Towards a future of deep learning beyond backpropagation? In International Joint Conference on Artificial Intelligence, 2022. doi: 10.24963/ijcai.2022/774.
- Predictive coding networks for temporal prediction. PLOS Computational Biology, 20(4):e1011183, 2024.
- Adaptive denoising via gaintuning. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 23727–23740. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/c7558e9d1f956b016d1fdba7ea132378-Paper.pdf.
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609, 1996. doi: 10.1038/381607a0.
- Sparse coding of sensory inputs. Current opinion in neurobiology, 14(4):481–487, 2004. doi: 10.1016/j.conb.2004.07.007.
- The contribution of spike threshold to the dichotomy of cortical simple and complex cells. Nature neuroscience, 7(10):1113–1122, 2004. doi: 10.1038/nn1310.
- Self2self with dropout: Learning self-supervised denoising from single image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- On bayesian mechanics: a physics of and by beliefs. Interface Focus, 13(3):20220029, 2023.
- Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87, 1999. doi: 10.1038/4580.
- Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pp. 1278–1286. PMLR, 2014. URL https://proceedings.mlr.press/v32/rezende14.html.
- Spikes: exploring the neural code. MIT press, 1999.
- Sparse coding via thresholding and local competition in neural circuits. Neural Computation, 20(10):2526–2563, 2008. doi: 10.1162/neco.2008.03-07-486.
- Artificial intelligence: a modern approach. Pearson, 2016.
- Jürgen Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Neural Computation, 4(1):131–139, 1992.
- Opportunities for neuromorphic computing algorithms and applications. Nature Computational Science, 2022. doi: 10.1038/s43588-022-00223-2.
- Terrence J Sejnowski. The unreasonable effectiveness of deep learning in artificial intelligence. Proceedings of the National Academy of Sciences, 117(48):30033–30038, 2020.
- The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. Journal of neuroscience, 18(10):3870–3896, 1998.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp. 2256–2265. PMLR, 2015.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Predictive coding: a fresh view of inhibition in the retina. Proceedings of the Royal Society of London. Series B. Biological Sciences, 216(1205):427–459, 1982. doi: 10.1098/rspb.1982.0085.
- Test-time training with self-supervision for generalization under distribution shifts. In International conference on machine learning, pp. 9229–9248. PMLR, 2020.
- Learning to (learn at test time): Rnns with expressive hidden states, 2024. URL https://arxiv.org/abs/2407.04620.
- A review of learning in biologically plausible spiking neural networks. Neural Networks, 122:253–272, 2020. ISSN 0893-6080. doi: 10.1016/j.neunet.2019.09.036.
- Malvin C Teich. Fractal character of the auditory neural spike train. IEEE Transactions on Biomedical Engineering, 36(1):150–160, 1989.
- Michael Teti. Lca-pytorch. [Computer Software] https://doi.org/10.11578/dc.20230728.4, jun 2023. URL https://doi.org/10.11578/dc.20230728.4.
- The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision research, 23(8):775–785, 1983.
- Informative neural ensemble kalman learning, 2020. URL https://arxiv.org/abs/2008.09915.
- A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of neurophysiology, 93(2):1074–1089, 2005.
- Hierarchical VAEs provide a normative account of motion processing in the primate brain. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=1wOkHN9JK8.
- Poisson variational autoencoder. 2024. URL https://arxiv.org/abs/2405.14473.
- J Hans Van Hateren and Arjen van der Schaaf. Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1394):359–366, 1998.
- Nicolaas Godfried Van Kampen. Stochastic processes in physics and chemistry, volume 1. Elsevier, 1992.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Hermann Von Helmholtz. Handbuch der physiologischen Optik, volume 9. Voss, 1867. URL https://archive.org/details/handbuchderphysi00helm.
- Evaluating the neurophysiological evidence for predictive processing as a model of perception. Annals of the new York Academy of Sciences, 1464(1):242–268, 2020.
- Capturing the dynamical repertoire of single neurons with generalized linear models. Neural computation, 29(12):3260–3289, 2017.
- Deep predictive coding network for object recognition. In International conference on machine learning, pp. 5266–5275. PMLR, 2018.
- B. Widrow. Adaptive ”adaline” Neuron Using Chemical ”memistors.”. 1960. URL https://books.google.com/books?id=Yc4EAAAAIAAJ.
- Adaptive Signal Processing. Prentice-Hall PTR, 1985.
- A neural implementation of the kalman filter. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta (eds.), Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009. URL https://proceedings.neurips.cc/paper_files/paper/2009/file/6d0f846348a856321729a2f36734d1a7-Paper.pdf.
- Diffusion models: A comprehensive survey of methods and applications, 2024. URL https://arxiv.org/abs/2209.00796.
- A survey on evaluation of out-of-distribution generalization. arXiv preprint arXiv:2403.01874, 2024.
- Vision as bayesian inference: analysis by synthesis? Trends in Cognitive Sciences, 10(7):301–308, 2006. doi: 10.1016/j.tics.2006.05.002.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022.
- A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of v1 simple cell receptive fields. PLoS computational biology, 7(10):e1002250, 2011.