Temporal Contrastive Learning through implicit non-equilibrium memory (2312.17723v2)
Abstract: The backpropagation method has enabled transformative uses of neural networks. Alternatively, for energy-based models, local learning methods involving only nearby neurons offer benefits in terms of decentralized training, and allow for the possibility of learning in computationally-constrained substrates. One class of local learning methods constrasts the desired, clamped behavior with spontaneous, free behavior. However, directly contrasting free and clamped behaviors requires explicit memory. Here, we introduce `Temporal Contrastive Learning', an approach that uses integral feedback in each learning degree of freedom to provide a simple form of implicit non-equilibrium memory. During training, free and clamped behaviors are shown in a sawtooth-like protocol over time. When combined with integral feedback dynamics, these alternating temporal protocols generate an implicit memory necessary for comparing free and clamped behaviors, broadening the range of physical and biological systems capable of contrastive learning. Finally, we show that non-equilibrium dissipation improves learning quality and determine a Landauer-like energy cost of contrastive learning through physical dynamics.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012).
- M. Stern and A. Murugan, Learning without neurons in physical systems, Annual Review of Condensed Matter Physics 14, 417 (2023).
- W. Zhong, D. J. Schwab, and A. Murugan, Associative pattern recognition through macro-molecular self-assembly, Journal of Statistical Physics 167, 806 (2017).
- M. Stern, M. B. Pinson, and A. Murugan, Continual learning of multiple memories in mechanical networks, Physical Review X 10, 031044 (2020a).
- V. R. Anisetti, B. Scellier, and J. M. Schwarz, Learning by non-interfering feedback chemical signaling in physical networks, Physical Review Research 5, 023024 (2023).
- V. P. Patil, I. Ho, and M. Prakash, Self-learning mechanical circuits, arXiv preprint arXiv:2304.08711 (2023).
- J. R. Movellan, Contrastive hebbian learning in the continuous hopfield model, in Connectionist models (Elsevier, 1991) pp. 10–17.
- X. Xie and H. S. Seung, Equivalence of backpropagation and contrastive hebbian learning in a layered network, Neural computation 15, 441 (2003).
- P. Baldi and F. Pineda, Contrastive Learning and Neural Oscillations, Neural Computation 3, 526 (1991), https://direct.mit.edu/neco/article-pdf/3/4/526/812186/neco.1991.3.4.526.pdf .
- G. E. Hinton, Training products of experts by minimizing contrastive divergence, Neural computation 14, 1771 (2002).
- B. Scellier and Y. Bengio, Equilibrium propagation: Bridging the gap between energy-based models and backpropagation, Frontiers in computational neuroscience 11, 24 (2017).
- J. C. Doyle, B. A. Francis, and A. R. Tannenbaum, Feedback control theory (Courier Corporation, 2013).
- D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learning algorithm for boltzmann machines, Cognitive science 9, 147 (1985).
- A. Laborieux and F. Zenke, Holomorphic equilibrium propagation computes exact gradients through finite size oscillations, Advances in Neural Information Processing Systems 35, 12950 (2022).
- L. Behera, S. Kumar, and A. Patnaik, On adaptive learning rate that guarantees convergence in feedforward networks, IEEE transactions on neural networks 17, 1116 (2006).
- W. Gerstner and W. M. Kistler, Mathematical formulations of hebbian learning, Biological cybernetics 87, 404 (2002).
- S. Löwel and W. Singer, Selection of intrinsic horizontal connections in the visual cortex by correlated neuronal activity, Science 255, 209 (1992).
- Y. Tu, T. S. Shimizu, and H. C. Berg, Modeling the chemotactic response of escherichia coli to time-varying stimuli, Proceedings of the National Academy of Sciences 105, 14855 (2008).
- T. E. Ouldridge, C. C. Govern, and P. R. ten Wolde, Thermodynamics of computational copying in biochemical systems, Physical Review X 7, 021004 (2017).
- N. B. Becker, A. Mugler, and P. R. Ten Wolde, Optimal prediction by cellular signaling networks, Physical review letters 115, 258103 (2015).
- X. Xie and H. S. Seung, Spike-based learning rules and stabilization of persistent neural activity, Advances in neural information processing systems 12 (1999).
- M. Rivière and Y. Meroz, Plants sum and subtract stimuli over different timescales, Proceedings of the National Academy of Sciences 120, e2306655120 (2023).
- Y. Tu, The nonequilibrium mechanism for ultrasensitivity in a biological switch: Sensing by maxwell’s demons, Proceedings of the National Academy of Sciences 105, 11737 (2008).
- A. Murugan and S. Vaikuntanathan, Topologically protected modes in non-equilibrium stochastic systems, Nature communications 8, 13881 (2017).
- I. Nemenman, Information theory and adaptation, Quantitative biology: from molecular to cellular systems 73 (2012).
- D. Scalise and R. Schulman, Controlling matter at the molecular scale with dna circuits, Annual review of biomedical engineering 21, 469 (2019).
- S. W. Schaffter and R. Schulman, Building in vitro transcriptional regulatory networks by successively integrating multiple functional circuit modules, Nature chemistry 11, 829 (2019).
- J. Fern and R. Schulman, Design and characterization of dna strand-displacement circuits in serum-supplemented cell medium, ACS Synthetic Biology 6, 1774 (2017).
- L. J. Kwakernaak and M. van Hecke, Counting and sequential information processing in mechanical metamaterials, arXiv preprint arXiv:2302.06947 (2023).
- J. A. Weinstein, A. Regev, and F. Zhang, Dna microscopy: optics-free spatio-genetic imaging by a stand-alone chemical reaction, Cell 178, 229 (2019).
- U. Alon, An introduction to systems biology: design principles of biological circuits (CRC press, 2019).
- P. Robin, N. Kavokine, and L. Bocquet, Modeling of emergent memory and voltage spiking in ionic transport through angstrom-scale slits, Science 373, 687 (2021).
- Y. Feng and Y. Tu, The inverse variance–flatness relation in stochastic gradient descent is critical for finding flat minima, Proceedings of the National Academy of Sciences 118, e2015617118 (2021).
- R. A. Watson and E. Szathmáry, How can evolution learn?, Trends in ecology & evolution 31, 147 (2016).
- G. Hinton, The forward-forward algorithm: Some preliminary investigations, arXiv preprint arXiv:2212.13345 (2022).
- E. Strubell, A. Ganesh, and A. McCallum, Energy and policy considerations for modern deep learning research, in Proceedings of the AAAI conference on artificial intelligence, Vol. 34 (2020) pp. 13693–13696.
- A. Murugan, D. A. Huse, and S. Leibler, Discriminatory proofreading regimes in nonequilibrium systems, Physical Review X 4, 021016 (2014).
- J. Ehrich, S. Still, and D. A. Sivak, Energetic cost of feedback control, Phys. Rev. Res. 5, 023080 (2023).
- X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the thirteenth international conference on artificial intelligence and statistics (JMLR Workshop and Conference Proceedings, 2010) pp. 249–256.
- A. Celani and M. Vergassola, Bacterial strategies for chemotaxis response, Proceedings of the National Academy of Sciences 107, 1391 (2010).
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.