Statistical mechanics of continual learning: variational principle and mean-field potential (2212.02846v4)
Abstract: An obstacle to artificial general intelligence is set by continual learning of multiple tasks of different nature. Recently, various heuristic tricks, both from machine learning and from neuroscience angles, were proposed, but they lack a unified theory ground. Here, we focus on continual learning in single-layered and multi-layered neural networks of binary weights. A variational Bayesian learning setting is thus proposed, where the neural networks are trained in a field-space, rather than gradient-ill-defined discrete-weight space, and furthermore, weight uncertainty is naturally incorporated, and modulates synaptic resources among tasks. From a physics perspective, we translate the variational continual learning into Franz-Parisi thermodynamic potential framework, where previous task knowledge acts as a prior and a reference as well. We thus interpret the continual learning of the binary perceptron in a teacher-student setting as a Franz-Parisi potential computation. The learning performance can then be analytically studied with mean-field order parameters, whose predictions coincide with numerical experiments using stochastic gradient descent methods. Based on the variational principle and Gaussian field approximation of internal preactivations in hidden layers, we also derive the learning algorithm considering weight uncertainty, which solves the continual learning with binary weights using multi-layered neural networks, and performs better than the currently available metaplasticity algorithm. Our proposed principled frameworks also connect to elastic weight consolidation, weight-uncertainty modulated learning, and neuroscience inspired metaplasticity, providing a theory-grounded method for the real-world multi-task learning with deep networks.
- Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation, 24:109–165, 1989.
- Continual lifelong learning with neural networks: A review. Neural Networks, 113:54–71, 2019.
- Continual learning in a multi-layer network of an electric fish. Cell, 179(6):1382–1392.e10, 2019.
- Algorithmic insights on continual learning from fruit flies. arXiv:2107.07617, 2021.
- Continual task learning in natural and artificial agents. ArXiv:2210.04520, 2022.
- Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization. Proceedings of the National Academy of Sciences, 115(44):E10467–E10475, 2018.
- Brain-inspired replay for continual learning with artificial neural networks. Nature Communications, 11(1):4069, 2020.
- Synaptic metaplasticity in binarized neural networks. Nature Communications, 12(1):2549, 2021.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017.
- Continual learning through synaptic intelligence. Proceedings of machine learning research, 70:3987–3995, 2017.
- Overcoming catastrophic forgetting with hard attention to the task. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4548–4557. PMLR, 2018.
- A unifying bayesian view of continual learning. arXiv:1902.06494, 2019.
- Task agnostic continual learning using online variational bayes. arXiv:1803.10123, 2018.
- Variational continual learning. In International Conference on Learning Representations, 2018.
- Uncertainty-guided continual learning with bayesian neural networks. In International Conference on Learning Representations, 2020.
- Continual learning with adaptive weights (claw). In International Conference on Learning Representations, 2020.
- Phase transitions in transfer learning for high-dimensional perceptrons. Entropy, 23:400, 2021.
- Generalization in multitask deep neural classifiers: a statistical physics approach. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alche-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Statistical mechanical analysis of catastrophic forgetting in continual learning with teacher and student networks. Journal of the Physical Society of Japan, 90(10):104001, 2021.
- Continual learning in the teacher-student setup: Impact of task similarity. arXiv:2107.04384, 2021.
- Probabilistic brains: knowns and unknowns. Nature Neuroscience, 16(9):1170–1178, 2013.
- Origin of the computational hardness for learning with binary synapses. Physical review. E, 90:052813, 2014.
- Recipes for metastable states in spin glasses. Journal De Physique I, 5(11):1401–1415, 1995.
- Haiping Huang. Statistical Mechanics of Neural Networks. Springer, Singapore, 2022.
- W. Krauth and M. Mézard. Storage capacity of memory networks with binary couplings. J. Phys. (France), 50:3057, 1989.
- G Gyorgyi. First-order transition to perfect generalization in a neural network with binary synapses. Physical Review A, 41(12):7097–7100, 1990.
- Learning from examples in large neural networks. Physical review letters, 65:1683–1686, 1990.
- Learning credit assignment. Phys. Rev. Lett., 125:178301, 2020.
- Role of synaptic stochasticity in training low-precision neural networks. Phys. Rev. Lett., 120:268103, 2018.
- Haiping Huang. Variational mean-field theory for training restricted boltzmann machines with binary synapses. Phys. Rev. E, 102:030301(R), 2020.
- Synaptic plasticity as bayesian inference. Nature neuroscience, 24:565–571, 2021.
- https://github.com/Chan-Li/VCL.