Generality of phase-evolution results across architectures, datasets, and tasks

Determine the extent to which the reported phase behavior of neural-network–derived multi-layer Ising spin systems—specifically, the monotonic and often power-law growth of the melting transition temperature T_c with training and the rapid replacement of the spin-glass transition by a hidden-order phase with a single Z2 symmetry-broken TAP solution—persists for neural networks beyond the studied feed-forward architectures trained on MNIST, including different network architectures, datasets, and tasks.

Background

The paper establishes a mapping between neural networks and multi-layer Ising spin models, showing analytically that randomly initialized networks correspond to a layered Sherrington–Kirkpatrick spin glass with replica symmetry breaking. Using TAP equations, the authors track the evolution of the system during training on MNIST for two network types (binarized activations and standard ReLU activations), finding that the glassy phase is quickly replaced by a phase with hidden order and that the melting transition temperature T_c grows with training, often as a power law.

These findings are demonstrated for specific feed-forward architectures with three hidden layers trained on MNIST classification. The authors explicitly state that it remains to be determined how robust these observations are when applied to different architectures, datasets, and tasks, indicating an unresolved question about the generality of the results.

References

It also remains to determine how well our results hold for different architectures, datasets, tasks, etc.

— Neural Networks as Spin Models: From Glass to Hidden Order Through Training (2408.06421 - Barney et al., 12 Aug 2024) in Section 5 (Conclusion)

Generality of phase-evolution results across architectures, datasets, and tasks

Background

References

Related Problems