Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Backpropagation through space, time, and the brain (2403.16933v2)

Published 25 Mar 2024 in q-bio.NC, cs.AI, cs.LG, cs.NE, and eess.SP

Abstract: How physical networks of neurons, bound by spatio-temporal locality constraints, can perform efficient credit assignment, remains, to a large extent, an open question. In machine learning, the answer is almost universally given by the error backpropagation algorithm, through both space and time. However, this algorithm is well-known to rely on biologically implausible assumptions, in particular with respect to spatio-temporal (non-)locality. Alternative forward-propagation models such as real-time recurrent learning only partially solve the locality problem, but only at the cost of scaling, due to prohibitive storage requirements. We introduce Generalized Latent Equilibrium (GLE), a computational framework for fully local spatio-temporal credit assignment in physical, dynamical networks of neurons. We start by defining an energy based on neuron-local mismatches, from which we derive both neuronal dynamics via stationarity and parameter dynamics via gradient descent. The resulting dynamics can be interpreted as a real-time, biologically plausible approximation of backpropagation through space and time in deep cortical networks with continuous-time neuronal dynamics and continuously active, local synaptic plasticity. In particular, GLE exploits the morphology of dendritic trees to enable more complex information storage and processing in single neurons, as well as the ability of biological neurons to phase-shift their output rate with respect to their membrane potential, which is essential in both directions of information propagation. For the forward computation, it enables the mapping of time-continuous inputs to neuronal space, effectively performing a spatio-temporal convolution. For the backward computation, it permits the temporal inversion of feedback signals, which consequently approximate the adjoint variables necessary for useful parameter updates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. “Automatic Differentiation in Machine Learning: A Survey” In Journal of Machine Learning Research 18.153, 2018, pp. 1–43
  2. B.A. Pearlmutter “Gradient Calculations for Dynamic Recurrent Neural Networks: A Survey” In IEEE Transactions on Neural Networks 6.5, 1995, pp. 1212–1228 DOI: 10.1109/72.410363
  3. Seppo Linnainmaa “Taylor Expansion of the Accumulated Rounding Error” In BIT Numerical Mathematics 16.2, 1976, pp. 146–160 DOI: 10.1007/BF01931367
  4. Paul John Werbos “Applications of Advances in Nonlinear Sensitivity Analysis”, Proc. IFIP, 1982
  5. David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams “Learning Representations by Back-Propagating Errors” In Nature 323.6088 Nature Publishing Group, 1986, pp. 533–536 DOI: 10.1038/323533a0
  6. Fernando J. Pineda “Generalization of Back-Propagation to Recurrent Neural Networks” In Physical Review Letters 59.19 American Physical Society, 1987, pp. 2229–2232 DOI: 10.1103/PhysRevLett.59.2229
  7. P.J. Werbos “Backpropagation through Time: What It Does and How to Do It” In Proceedings of the IEEE 78.10, 1990, pp. 1550–1560 DOI: 10.1109/5.58337
  8. Henry J Kelley “Method of gradients” In Mathematics in Science and Engineering 5 Elsevier, 1962, pp. 205–254
  9. Emanual Todorov “Optimal control theory” In The Bayesian Brain MIT Press, 2006, pp. 12: 1–28
  10. Benoit Chachuat “Nonlinear and Dynamic Optimization: From Theory to Practice” In Lecture notes, EPFL, 2007, pp. 1–187 URL: https://infoscience.epfl.ch/record/111939
  11. Richard Sutton “Reinforcement learning” MIT Press, 2018
  12. “Full-FORCE: A Target-Based Method for Training Recurrent Networks” In PLOS ONE 13.2 Public Library of Science, 2018, pp. e0191527 DOI: 10.1371/journal.pone.0191527
  13. “Predicting Non-Linear Dynamics by Stable Local Learning in a Recurrent Spiking Neural Network” In eLife 6 eLife Sciences Publications, Ltd, 2017, pp. e28295 DOI: 10.7554/eLife.28295
  14. Owen Marschall, Kyunghyun Cho and Cristina Savin “A Unified Framework of Online Learning Algorithms for Training Recurrent Neural Networks” In Journal of Machine Learning Research 21.135, 2020, pp. 1–34
  15. “Latent Equilibrium: A Unified Learning Theory for Arbitrarily Fast Computation with Arbitrarily Slow Neurons” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 17839–17851
  16. A. L. Hodgkin and A. F. Huxley “A Quantitative Description of Membrane Current and Its Application to Conduction and Excitation in Nerve” In The Journal of Physiology 117.4, 1952, pp. 500–544
  17. “The Dynamical Response Properties of Neocortical Neurons to Temporally Modulated Noisy Inputs In Vitro” In Cerebral Cortex 18.9 Oxford Academic, 2008, pp. 2086–2097 DOI: 10.1093/cercor/bhm235
  18. Hans E. Plesser and Wulfram Gerstner “Escape Rate Models for Noisy Integrate-and-Free Neurons” In Neurocomputing 32–33, 2000, pp. 219–224 DOI: 10.1016/S0925-2312(00)00167-3
  19. “Temporal whitening by power-law adaptation in neocortical neurons” In Nature neuroscience 16.7 Nature Publishing Group US New York, 2013, pp. 942–948
  20. “Prospective and retrospective coding in cortex and hippocampus” In In preparation, 2024
  21. Eugene M Izhikevich “Simple model of spiking neurons” In IEEE Transactions on neural networks 14.6 IEEE, 2003, pp. 1569–1572
  22. “Adaptive exponential integrate-and-fire model as an effective description of neuronal activity” In Journal of neurophysiology 94.5 American Physiological Society, 2005, pp. 3637–3642
  23. “An accurate and flexible analog emulation of AdEx neuron dynamics in silicon” In 2022 29th IEEE International Conference on Electronics, Circuits and Systems (ICECS), 2022, pp. 1–4 IEEE
  24. J J Hopfield “Neural Networks and Physical Systems with Emergent Collective Computational Abilities.” In Proceedings of the National Academy of Sciences 79.8 Proceedings of the National Academy of Sciences, 1982, pp. 2554–2558 DOI: 10.1073/pnas.79.8.2554
  25. David H. Ackley, Geoffrey E. Hinton and Terrence J. Sejnowski “A Learning Algorithm for Boltzmann Machines*” In Cognitive Science 9.1, 1985, pp. 147–169 DOI: 10.1207/s15516709cog0901˙7
  26. “Equilibrium Propagation: Bridging the Gap Between Energy-Based Models and Backpropagation” In arXiv:1602.05179 [cs], 2017 arXiv:1602.05179 [cs]
  27. “A model of spike initiation in neocortical pyramidal neurons” In Neuron 15.6, 1995, pp. 1427–1439 DOI: 10.1016/0896-6273(95)90020-9
  28. “Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition” Cambridge University Press, 2014 DOI: 10.1017/CBO9781107447615
  29. Roger D. Traub and Richard Miles “Neuronal Networks of the Hippocampus” Cambridge University Press, 1991 DOI: 10.1017/CBO9780511895401
  30. “Simulation of networks of spiking neurons: A review of tools and strategies” In Journal of Computational Neuroscience 23.3, 2007, pp. 349–398 DOI: 10.1007/s10827-007-0038-6
  31. “Learning by the dendritic prediction of somatic spiking” In Neuron 81.3 Elsevier, 2014, pp. 521–528
  32. Pawel Zmarz and Georg B Keller “Mismatch receptive fields in mouse visual cortex” In Neuron 92.4 Elsevier, 2016, pp. 766–772
  33. “Experience-dependent spatial expectations in mouse visual cortex” In Nature neuroscience 19.12 Nature Publishing Group US New York, 2016, pp. 1658–1664
  34. Alexander Attinger, Bo Wang and Georg B Keller “Visuomotor coupling shapes the functional development of mouse visual cortex” In Cell 169.7 Elsevier, 2017, pp. 1291–1302
  35. Georg B. Keller and Thomas D. Mrsic-Flogel “Predictive Processing: A Canonical Cortical Computation” In Neuron 100.2, 2018, pp. 424–435 DOI: 10.1016/j.neuron.2018.10.003
  36. “Layer-specific integration of locomotion and sensory information in mouse barrel cortex” In Nature communications 10.1 Nature Publishing Group UK London, 2019, pp. 2585
  37. “Responses to pattern-violating visual stimuli evolve differently over days in somata and distal apical dendrites” In Journal of Neuroscience 44.5 Soc Neuroscience, 2024
  38. “Division and subtraction by distinct cortical inhibitory networks in vivo” In Nature 488.7411 Nature Publishing Group UK London, 2012, pp. 343–348
  39. “Inhibitory actions unified by network integration” In Neuron 87.6 Elsevier, 2015, pp. 1181–1192
  40. “A disinhibitory circuit mediates motor integration in the somatosensory cortex” In Nature neuroscience 16.11 Nature Publishing Group US New York, 2013, pp. 1662–1670
  41. “The impact of SST and PV interneurons on nonlinear synaptic integration in the neocortex” In Eneuro 8.5 Society for Neuroscience, 2021
  42. Mihai Alexandru Petrovici “Form versus function: theory and models for neuronal substrates” Springer, 2016
  43. “Random Feedback Weights Support Learning in Deep Neural Networks” In arXiv:1411.0247 [cs, q-bio], 2014 arXiv:1411.0247 [cs, q-bio]
  44. “Learning Efficient Backprojections Across Cortical Hierarchies in Real Time” In Artificial Neural Networks and Machine Learning – ICANN 2023 Cham: Springer Nature Switzerland, 2023, pp. 556–559 DOI: 10.1007/978-3-031-44207-0˙48
  45. “Weight transport through spike timing for robust local gradients”, in prep.
  46. Sam Greydanus “Scaling down Deep Learning” arXiv, 2020 DOI: 10.48550/arXiv.2011.14439
  47. Shaojie Bai, J. Zico Kolter and Vladlen Koltun “An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling”, 2018 arXiv:1803.01271 [cs.LG]
  48. “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”, 2014 arXiv:1406.1078 [cs.CL]
  49. Pete Warden “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”, 2018 arXiv:1804.03209 [cs.CL]
  50. “Hello Edge: Keyword Spotting on Microcontrollers”, 2018 arXiv:1711.07128 [cs.SD]
  51. Pierre Baldi, Peter Sadowski and Daniel Whiteson “Searching for exotic particles in high-energy physics with deep learning” In Nature communications 5.1 Nature Publishing Group UK London, 2014, pp. 4308
  52. “Gradient-based learning applied to document recognition” In Proceedings of the IEEE 86.11 Ieee, 1998, pp. 2278–2324
  53. “Learning multiple layers of features from tiny images” Toronto, ON, Canada, 2009
  54. “Backpropagation applied to handwritten zip code recognition” In Neural computation 1.4 MIT Press, 1989, pp. 541–551
  55. Ronald J. Williams and David Zipser “Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity” In Backpropagation: Theory, Architectures, and Applications, Developments in Connectionist Theory Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc, 1995, pp. 433–486
  56. James M. Murray “Local Online Learning in Recurrent Networks with Random Feedback” In eLife 8, 2019, pp. e43299 DOI: 10.7554/eLife.43299
  57. P. Campolucci, A. Uncini and F. Piazza “Causal Back Propagation through Time for Locally Recurrent Neural Networks” In 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96 3, 1996, pp. 531–534 vol.3 DOI: 10.1109/ISCAS.1996.541650
  58. “Resurrecting Recurrent Neural Networks for Long Sequences” In Proceedings of the 40th International Conference on Machine Learning PMLR, 2023, pp. 26670–26698
  59. “Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues” arXiv, 2024 DOI: 10.48550/arXiv.2307.11888
  60. “Online Learning of Long-Range Dependencies” arXiv, 2023 DOI: 10.48550/arXiv.2305.15947
  61. “Mamba: Linear-Time Sequence Modeling with Selective State Spaces” arXiv, 2023 DOI: 10.48550/arXiv.2312.00752
  62. “Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models” arXiv, 2024 DOI: 10.48550/arXiv.2402.19427
  63. “Dendritic Cortical Microcircuits Approximate the Backpropagation Algorithm” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Benjamin Ellenberger (2 papers)
  2. Paul Haider (2 papers)
  3. Jakob Jordan (19 papers)
  4. Kevin Max (8 papers)
  5. Ismael Jaras (2 papers)
  6. Laura Kriener (14 papers)
  7. Federico Benitez (8 papers)
  8. Mihai A. Petrovici (44 papers)
Citations (5)

Summary

An Expert Analysis of "Backpropagation through space, time, and the brain"

The paper "Backpropagation through space, time, and the brain" introduces a novel computational framework, Generalized Latent Equilibrium (GLE), which addresses the spatio-temporal credit assignment problem in physical neuronal networks. This framework provides an alternative to traditional machine learning methods such as backpropagation through time (BPTT), offering a biologically plausible solution devoid of their non-locality issues and excessive memory demands. Below, I provide a comprehensive exploration of the framework, its merits, theoretical underpinnings, and implications for both neuroscience and neuromorphic computing.

Overview and Key Contributions

The authors start by identifying a fundamental constraint in neuronal networks—both biological and artificial—where synaptic updates must rely on local information. Classical algorithms like error backpropagation (BP) and BPTT do not adhere to this constraint, as they rely on non-local information for performing credit assignment. In response, the paper proposes GLE, which integrates biologically plausible neuronal dynamics into an energy-based model that adheres to local constraints both spatially and temporally.

The core of GLE is realized through four postulates. First, it assumes that biological neurons are capable of temporal integration and differentiation, thus performing retrospective and prospective operations. These capabilities, seldom acknowledged in typical models, are pivotal in allowing the neurons to dynamically adjust their temporal attention windows. The second postulate introduces an energy function based on neuron-local mismatches, serving as the foundation for the model's dynamics. The stationarity principle and gradient descent lead to GLE's local synaptic updates without requiring explicit temporal inversion.

Theoretical Underpinnings

From a theoretical perspective, the GLE framework extends the Latent Equilibrium (LE) model. Unlike LE, GLE does not constrain neurons to operate at identical time constants across the network. This flexibility allows for memory effects and dynamic temporal error transmission, which makes GLE apt for spatio-temporal learning tasks. The network dynamics derived from the GLE postulates implement a real-time approximation of AM/BPTT by leveraging prospective coding to align error backpropagation in time.

The authors demonstrate that GLE's neuronal dynamics encompass backward (error) dynamics that utilize inverse temporal operators. This results in error signals that are well-synchronized with local neuronal states, paving the path for efficient learning. Notably, in the Fourier space analysis, the GLE framework accurately replicates the phase shift characteristics of adjoint equations, which is vital for learning temporal dependencies in real-time.

Practical and Theoretical Implications

By situating GLE within an energy-based paradigm, the framework inherits the robustness associated with these approaches. The reduction of complex operations into local interactions makes it well-suited for neuromorphic implementations. The architectural proposal not only translates into significant improvements in energy efficiency for artificial neuronal systems but also suggests viable mechanisms for error propagation in biological circuits, aligning with evidence from neuroscience.

Moreover, the authors demonstrate the practicality of GLE through extensive simulation results. It holds its ground against state-of-the-art methods in challenging spatio-temporal tasks such as MNIST-1D and the Google Speech Commands dataset, maintaining competitive performance levels even while learning in an intrinsically online manner—a feature valuable for both biological learning and real-time applications.

Future Prospects

This framework opens new avenues for neuroscience research, particularly in understanding how learning occurs on multiple timescales in the brain. The physiological basis for prospective coding and its integration into structured neuronal networks require further exploration. It paves the way for more sophisticated models of cortical microcircuits that could offer insights into the intricacies of neuronal computation.

In the field of artificial intelligence and neuromorphic computing, GLE presents a novel blueprint for designing systems that mimic the efficiency of natural intelligence. Its applicability to low-power, continuous learning tasks signals a potential shift in how AI systems interface with dynamic environments.

To conclude, "Backpropagation through space, time, and the brain" contributes significantly to the synthesis of biological insights and machine learning imperatives, pushing the boundary of what is achievable in both theoretical neurosciences and AI hardware design. This framework could very well guide future developments in adaptive systems that learn from spatio-temporal data in real-time.

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com