Backpropagation through space, time, and the brain (2403.16933v2)
Abstract: How physical networks of neurons, bound by spatio-temporal locality constraints, can perform efficient credit assignment, remains, to a large extent, an open question. In machine learning, the answer is almost universally given by the error backpropagation algorithm, through both space and time. However, this algorithm is well-known to rely on biologically implausible assumptions, in particular with respect to spatio-temporal (non-)locality. Alternative forward-propagation models such as real-time recurrent learning only partially solve the locality problem, but only at the cost of scaling, due to prohibitive storage requirements. We introduce Generalized Latent Equilibrium (GLE), a computational framework for fully local spatio-temporal credit assignment in physical, dynamical networks of neurons. We start by defining an energy based on neuron-local mismatches, from which we derive both neuronal dynamics via stationarity and parameter dynamics via gradient descent. The resulting dynamics can be interpreted as a real-time, biologically plausible approximation of backpropagation through space and time in deep cortical networks with continuous-time neuronal dynamics and continuously active, local synaptic plasticity. In particular, GLE exploits the morphology of dendritic trees to enable more complex information storage and processing in single neurons, as well as the ability of biological neurons to phase-shift their output rate with respect to their membrane potential, which is essential in both directions of information propagation. For the forward computation, it enables the mapping of time-continuous inputs to neuronal space, effectively performing a spatio-temporal convolution. For the backward computation, it permits the temporal inversion of feedback signals, which consequently approximate the adjoint variables necessary for useful parameter updates.
- “Automatic Differentiation in Machine Learning: A Survey” In Journal of Machine Learning Research 18.153, 2018, pp. 1–43
- B.A. Pearlmutter “Gradient Calculations for Dynamic Recurrent Neural Networks: A Survey” In IEEE Transactions on Neural Networks 6.5, 1995, pp. 1212–1228 DOI: 10.1109/72.410363
- Seppo Linnainmaa “Taylor Expansion of the Accumulated Rounding Error” In BIT Numerical Mathematics 16.2, 1976, pp. 146–160 DOI: 10.1007/BF01931367
- Paul John Werbos “Applications of Advances in Nonlinear Sensitivity Analysis”, Proc. IFIP, 1982
- David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams “Learning Representations by Back-Propagating Errors” In Nature 323.6088 Nature Publishing Group, 1986, pp. 533–536 DOI: 10.1038/323533a0
- Fernando J. Pineda “Generalization of Back-Propagation to Recurrent Neural Networks” In Physical Review Letters 59.19 American Physical Society, 1987, pp. 2229–2232 DOI: 10.1103/PhysRevLett.59.2229
- P.J. Werbos “Backpropagation through Time: What It Does and How to Do It” In Proceedings of the IEEE 78.10, 1990, pp. 1550–1560 DOI: 10.1109/5.58337
- Henry J Kelley “Method of gradients” In Mathematics in Science and Engineering 5 Elsevier, 1962, pp. 205–254
- Emanual Todorov “Optimal control theory” In The Bayesian Brain MIT Press, 2006, pp. 12: 1–28
- Benoit Chachuat “Nonlinear and Dynamic Optimization: From Theory to Practice” In Lecture notes, EPFL, 2007, pp. 1–187 URL: https://infoscience.epfl.ch/record/111939
- Richard Sutton “Reinforcement learning” MIT Press, 2018
- “Full-FORCE: A Target-Based Method for Training Recurrent Networks” In PLOS ONE 13.2 Public Library of Science, 2018, pp. e0191527 DOI: 10.1371/journal.pone.0191527
- “Predicting Non-Linear Dynamics by Stable Local Learning in a Recurrent Spiking Neural Network” In eLife 6 eLife Sciences Publications, Ltd, 2017, pp. e28295 DOI: 10.7554/eLife.28295
- Owen Marschall, Kyunghyun Cho and Cristina Savin “A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks” In Journal of Machine Learning Research 21.135, 2020, pp. 1–34
- “Latent Equilibrium: A Unified Learning Theory for Arbitrarily Fast Computation with Arbitrarily Slow Neurons” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 17839–17851
- A. L. Hodgkin and A. F. Huxley “A Quantitative Description of Membrane Current and Its Application to Conduction and Excitation in Nerve” In The Journal of Physiology 117.4, 1952, pp. 500–544
- “The Dynamical Response Properties of Neocortical Neurons to Temporally Modulated Noisy Inputs In Vitro” In Cerebral Cortex 18.9 Oxford Academic, 2008, pp. 2086–2097 DOI: 10.1093/cercor/bhm235
- Hans E. Plesser and Wulfram Gerstner “Escape Rate Models for Noisy Integrate-and-Free Neurons” In Neurocomputing 32–33, 2000, pp. 219–224 DOI: 10.1016/S0925-2312(00)00167-3
- “Temporal whitening by power-law adaptation in neocortical neurons” In Nature neuroscience 16.7 Nature Publishing Group US New York, 2013, pp. 942–948
- “Prospective and retrospective coding in cortex and hippocampus” In In preparation, 2024
- Eugene M Izhikevich “Simple model of spiking neurons” In IEEE Transactions on neural networks 14.6 IEEE, 2003, pp. 1569–1572
- “Adaptive exponential integrate-and-fire model as an effective description of neuronal activity” In Journal of neurophysiology 94.5 American Physiological Society, 2005, pp. 3637–3642
- “An accurate and flexible analog emulation of AdEx neuron dynamics in silicon” In 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2022, pp. 1–4 IEEE
- J J Hopfield “Neural Networks and Physical Systems with Emergent Collective Computational Abilities.” In Proceedings of the National Academy of Sciences 79.8 Proceedings of the National Academy of Sciences, 1982, pp. 2554–2558 DOI: 10.1073/pnas.79.8.2554
- David H. Ackley, Geoffrey E. Hinton and Terrence J. Sejnowski “A Learning Algorithm for Boltzmann Machines*” In Cognitive Science 9.1, 1985, pp. 147–169 DOI: 10.1207/s15516709cog0901˙7
- “Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation” In arXiv:1602.05179 [cs], 2017 arXiv:1602.05179 [cs]
- “A model of spike initiation in neocortical pyramidal neurons” In Neuron 15.6, 1995, pp. 1427–1439 DOI: 10.1016/0896-6273(95)90020-9
- “Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition” Cambridge University Press, 2014 DOI: 10.1017/CBO9781107447615
- Roger D. Traub and Richard Miles “Neuronal Networks of the Hippocampus” Cambridge University Press, 1991 DOI: 10.1017/CBO9780511895401
- “Simulation of networks of spiking neurons: A review of tools and strategies” In Journal of Computational Neuroscience 23.3, 2007, pp. 349–398 DOI: 10.1007/s10827-007-0038-6
- “Learning by the dendritic prediction of somatic spiking” In Neuron 81.3 Elsevier, 2014, pp. 521–528
- Pawel Zmarz and Georg B Keller “Mismatch receptive fields in mouse visual cortex” In Neuron 92.4 Elsevier, 2016, pp. 766–772
- “Experience-dependent spatial expectations in mouse visual cortex” In Nature neuroscience 19.12 Nature Publishing Group US New York, 2016, pp. 1658–1664
- Alexander Attinger, Bo Wang and Georg B Keller “Visuomotor coupling shapes the functional development of mouse visual cortex” In Cell 169.7 Elsevier, 2017, pp. 1291–1302
- Georg B. Keller and Thomas D. Mrsic-Flogel “Predictive Processing: A Canonical Cortical Computation” In Neuron 100.2, 2018, pp. 424–435 DOI: 10.1016/j.neuron.2018.10.003
- “Layer-specific integration of locomotion and sensory information in mouse barrel cortex” In Nature communications 10.1 Nature Publishing Group UK London, 2019, pp. 2585
- “Responses to pattern-violating visual stimuli evolve differently over days in somata and distal apical dendrites” In Journal of Neuroscience 44.5 Soc Neuroscience, 2024
- “Division and subtraction by distinct cortical inhibitory networks in vivo” In Nature 488.7411 Nature Publishing Group UK London, 2012, pp. 343–348
- “Inhibitory actions unified by network integration” In Neuron 87.6 Elsevier, 2015, pp. 1181–1192
- “A disinhibitory circuit mediates motor integration in the somatosensory cortex” In Nature neuroscience 16.11 Nature Publishing Group US New York, 2013, pp. 1662–1670
- “The impact of SST and PV interneurons on nonlinear synaptic integration in the neocortex” In Eneuro 8.5 Society for Neuroscience, 2021
- Mihai Alexandru Petrovici “Form versus function: theory and models for neuronal substrates” Springer, 2016
- “Random Feedback Weights Support Learning in Deep Neural Networks” In arXiv:1411.0247 [cs, q-bio], 2014 arXiv:1411.0247 [cs, q-bio]
- “Learning Efficient Backprojections Across Cortical Hierarchies in Real Time” In Artificial Neural Networks and Machine Learning – ICANN 2023 Cham: Springer Nature Switzerland, 2023, pp. 556–559 DOI: 10.1007/978-3-031-44207-0˙48
- “Weight transport through spike timing for robust local gradients”, in prep.
- Sam Greydanus “Scaling down Deep Learning” arXiv, 2020 DOI: 10.48550/arXiv.2011.14439
- Shaojie Bai, J. Zico Kolter and Vladlen Koltun “An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling”, 2018 arXiv:1803.01271 [cs.LG]
- “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”, 2014 arXiv:1406.1078 [cs.CL]
- Pete Warden “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”, 2018 arXiv:1804.03209 [cs.CL]
- “Hello Edge: Keyword Spotting on Microcontrollers”, 2018 arXiv:1711.07128 [cs.SD]
- Pierre Baldi, Peter Sadowski and Daniel Whiteson “Searching for exotic particles in high-energy physics with deep learning” In Nature communications 5.1 Nature Publishing Group UK London, 2014, pp. 4308
- “Gradient-based learning applied to document recognition” In Proceedings of the IEEE 86.11 Ieee, 1998, pp. 2278–2324
- “Learning multiple layers of features from tiny images” Toronto, ON, Canada, 2009
- “Backpropagation applied to handwritten zip code recognition” In Neural computation 1.4 MIT Press, 1989, pp. 541–551
- Ronald J. Williams and David Zipser “Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity” In Backpropagation: Theory, Architectures, and Applications, Developments in Connectionist Theory Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc, 1995, pp. 433–486
- James M. Murray “Local Online Learning in Recurrent Networks with Random Feedback” In eLife 8, 2019, pp. e43299 DOI: 10.7554/eLife.43299
- P. Campolucci, A. Uncini and F. Piazza “Causal Back Propagation through Time for Locally Recurrent Neural Networks” In 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96 3, 1996, pp. 531–534 vol.3 DOI: 10.1109/ISCAS.1996.541650
- “Resurrecting Recurrent Neural Networks for Long Sequences” In Proceedings of the 40th International Conference on Machine Learning PMLR, 2023, pp. 26670–26698
- “Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues” arXiv, 2024 DOI: 10.48550/arXiv.2307.11888
- “Online Learning of Long-Range Dependencies” arXiv, 2023 DOI: 10.48550/arXiv.2305.15947
- “Mamba: Linear-Time Sequence Modeling with Selective State Spaces” arXiv, 2023 DOI: 10.48550/arXiv.2312.00752
- “Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models” arXiv, 2024 DOI: 10.48550/arXiv.2402.19427
- “Dendritic Cortical Microcircuits Approximate the Backpropagation Algorithm” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018
- Benjamin Ellenberger (2 papers)
- Paul Haider (2 papers)
- Jakob Jordan (19 papers)
- Kevin Max (8 papers)
- Ismael Jaras (2 papers)
- Laura Kriener (14 papers)
- Federico Benitez (8 papers)
- Mihai A. Petrovici (44 papers)