Dataset-learning duality and emergent criticality (2405.17391v2)
Abstract: In artificial neural networks, the activation dynamics of non-trainable variables is strongly coupled to the learning dynamics of trainable variables. During the activation pass, the boundary neurons (e.g., input neurons) are mapped to the bulk neurons (e.g., hidden neurons), and during the learning pass, both bulk and boundary neurons are mapped to changes in trainable variables (e.g., weights and biases). For example, in feed-forward neural networks, forward propagation is the activation pass and backward propagation is the learning pass. We show that a composition of the two maps establishes a duality map between a subspace of non-trainable boundary variables (e.g., dataset) and a tangent subspace of trainable variables (i.e., learning). In general, the dataset-learning duality is a complex non-linear map between high-dimensional spaces, but in a learning equilibrium, the problem can be linearized and reduced to many weakly coupled one-dimensional problems. We use the duality to study the emergence of criticality, or the power-law distributions of fluctuations of the trainable variables. In particular, we show that criticality can emerge in the learning system even from the dataset in a non-critical state, and that the power-law distribution can be modified by changing either the activation function or the loss function.
- David J. Griffiths. Introduction to Electrodynamics (3rd Edition). Pearson, 1999.
- The Feynman Lectures on Physics Vol. III: Quantum Mechanics. Basic Books, 2011.
- Joseph Polchinski. String Theory: Volume II, Superstring Theory and Beyond. Cambridge University Press, 2005.
- Vitaly Vanchurin. A quantum-classical duality and emergent space-time. 10th Mathematical Physics Meeting, pages 347–366, March 2019.
- Vitaly Vanchurin. Dual Path Integral: a non-perturbative approach to strong coupling. European Physical Journal C, 81:235, March 2021.
- Vitaly Vanchurin. Differential equation for partition functions and a duality pseudo-forest. Journal of Mathematical Physics, 63:073501, July 2022.
- Leonard Susskind. The world as a hologram. Journal of Mathematical Physics, 36(11):6377–6396, 1995.
- Juan Maldacena. The Large-N Limit of Superconformal Field Theories and Supergravity. International Journal of Theoretical Physics, 38:1113–1133, January 1999.
- Statistical physics. Pt.1. 1969.
- Quasi-Equilibrium States and Phase Transitions in Biological Evolution. Entropy, 26(3):201, February 2024.
- V. Vanchurin. The world as a neural network. Entropy, 22(11):1210, 2022.
- M. Katsnelson and V. Vanchurin. Emergent quantumness in neural networks. Foundation of Physics, 51(94), 2021.
- V. Vanchurin. Towards a theory of quantum gravity from neural networks. Entropy, 24(1):7, 2022.
- Thermodynamics of evolution and the origin of life. Proceedings of the National Academy of Science, 119(6):e2120042119, February 2022.
- Toward a theory of evolution as multilevel learning. Proceedings of the National Academy of Science, 119(6):e2120037119, February 2022.
- Emergent scale invariance in neural networks. Physica A Statistical Mechanics and its Applications, 610:128401, January 2023.
- V. Vanchurin. Towards a theory of machine learning. Mach. Learn.: Sci. Technol., 2(035012), 2021.
- Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- Cybernetics and Forecasting Techniques. Modern analytic and computational methods in science and mathematics. American Elsevier Publishing Company, 1967.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Mark A. Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37(2):233–243, February 1991.
- Attention Is All You Need. arXiv e-prints, page arXiv:1706.03762, June 2017.