Neural Networks as Spin Models: From Glass to Hidden Order Through Training (2408.06421v1)

Published 12 Aug 2024 in cond-mat.dis-nn, cs.LG, and nlin.AO

Abstract: We explore a one-to-one correspondence between a neural network (NN) and a statistical mechanical spin model where neurons are mapped to Ising spins and weights to spin-spin couplings. The process of training an NN produces a family of spin Hamiltonians parameterized by training time. We study the magnetic phases and the melting transition temperature as training progresses. First, we prove analytically that the common initial state before training--an NN with independent random weights--maps to a layered version of the classical Sherrington-Kirkpatrick spin glass exhibiting a replica symmetry breaking. The spin-glass-to-paramagnet transition temperature is calculated. Further, we use the Thouless-Anderson-Palmer (TAP) equations--a theoretical technique to analyze the landscape of energy minima of random systems--to determine the evolution of the magnetic phases on two types of NNs (one with continuous and one with binarized activations) trained on the MNIST dataset. The two NN types give rise to similar results, showing a quick destruction of the spin glass and the appearance of a phase with a hidden order, whose melting transition temperature $T_c$ grows as a power law in training time. We also discuss the properties of the spectrum of the spin system's bond matrix in the context of rich vs. lazy learning. We suggest that this statistical mechanical view of NNs provides a useful unifying perspective on the training process, which can be viewed as selecting and strengthening a symmetry-broken state associated with the training task.

Summary

The paper reveals a mapping between neural networks and spin models, demonstrating the transition from a spin glass state to hidden order during training.
It applies TAP equations and analytical methods to calculate evolving critical temperatures that grow as a power law with training time.
The findings bridge machine learning with statistical mechanics and offer insights for designing neuromorphic hardware and quantum systems.

Neural Networks as Spin Models: From Glass to Hidden Order Through Training

The paper "Neural Networks as Spin Models: From Glass to Hidden Order Through Training" by Richard Barney, Michael Winer, and Victor Galitski explores an intriguing correspondence between neural networks (NNs) and statistical mechanical spin models. This paper draws on the synergistic relationship between machine learning and statistical mechanics, applying concepts from the latter to provide a unifying perspective on the training of NNs.

Summary and Key Points

The authors investigate a one-to-one mapping between neurons in an NN and Ising spins in a statistical mechanical spin model. Weights between neurons are mapped to spin-spin couplings, and biases are analogous to magnetic fields. The training process in NNs thus becomes analogous to an evolving family of spin Hamiltonians parameterized by training time.

Initial State and Spin Glass Transition

Initially, an NN with random weights is shown to correspond to a layered version of the classical Sherrington-Kirkpatrick (SK) spin glass model. This model exhibits a spin-glass-to-paramagnet transition characterized by replica symmetry breaking. The transition temperature $T_c$ is analytically calculated for the multi-layer SK model as $T_c = [2C\cos(\pi/(L+2))]^{1/2}$ , where $C$ and $L$ represent structural parameters of the NN.

Training and Evolution of Magnetic Phases

The paper focuses on two NN architectures:

A partially binarized NN (PBNN) that constrains neurons to $\pm 1$ .
A standard NN (SNN) with rectified linear unit (ReLU) activations.

Both types are trained on the MNIST dataset. Using the Thouless-Anderson-Palmer (TAP) equations, typically used to analyze energy landscapes of random systems, the authors examine the evolution of magnetic phases during training.

Key Findings on Transition Temperature and Order Evolution

The critical temperature $T_c$ of each NN architecture evolves distinctively as training progresses:

Initially, both PBNN and SNN show a destruction of the spin glass state and the emergence of a phase characterized by hidden order.
For both NNs, $T_c$ demonstrated power-law growth, $T_c(t) \propto t^\alpha$ , indicative of strengthening the symmetry-broken state.
This hidden order is suggested to encode task-specific information required for the classification tasks.

The analysis indicates that training rapidly transforms the initial glassy phase to a structured order, evident from the change in the spectral properties of the bond matrix $J$ . The largest eigenvalue of the bond matrix grows in a power-law fashion, suggesting a transition from a lazy to a rich learning regime in the NN.

Implications and Future Research Directions

This paper offers several practical and theoretical contributions:

Unified Perspective on Training: It posits that training NNs effectively selects and strengthens a small number of symmetry-broken states aligned with the given tasks.
Statistical Mechanical View: The correspondence established allows borrowing intuition from statistical mechanics, aiding in understanding the underlying mechanisms in NNs.
Neuromorphic Computing: Insights from this paper could guide the development of neuromorphic hardware, especially in systems where quantum components are involved.

Future Directions

The exploration opens several avenues for future research:

Quantum Extensions: Investigating the effect of mapping NNs to quantum spin systems and comparing resultant differences with classical counterparts.
Broader Dataset and Architectures: Extending the analysis to different NN architectures, datasets, and tasks to validate the generality of findings.
Low-Temperature Behavior: Detailed examination of the low-temperature phases and the complete characterization of TAP solutions.

The paper’s findings underscore the utility of using statistical mechanics for a deeper understanding of NN training dynamics, representing a step toward integrating traditional physics methods with modern machine learning paradigms.

PDF Markdown

Related Papers

Tweets

https://twitter.com/TimothyDuignan/status/1824163202944602442

https://twitter.com/yacineMTB/status/1830978526083998022