Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition (2310.06301v1)

Published 10 Oct 2023 in cs.LG and cs.AI

Abstract: We investigate phase transitions in a Toy Model of Superposition (TMS) using Singular Learning Theory (SLT). We derive a closed formula for the theoretical loss and, in the case of two hidden dimensions, discover that regular $k$-gons are critical points. We present supporting theory indicating that the local learning coefficient (a geometric invariant) of these $k$-gons determines phase transitions in the Bayesian posterior as a function of training sample size. We then show empirically that the same $k$-gon critical points also determine the behavior of SGD training. The picture that emerges adds evidence to the conjecture that the SGD learning trajectory is subject to a sequential learning mechanism. Specifically, we find that the learning process in TMS, be it through SGD or Bayesian learning, can be characterized by a journey through parameter space from regions of high loss and low complexity to regions of low loss and high complexity.

Citations (8)

Summary

  • The paper derives closed-form potentials via singular learning theory to characterize Bayesian phase transitions in the TMS model.
  • It identifies local learning coefficients that signal critical k-gon phases, with a notable 5-gon to 6-gon transition around 600 samples.
  • Empirical results from MCMC sampling and SGD trajectories confirm that dynamical loss plateaus align with Bayesian critical points, enhancing model interpretability.

Dynamical versus Bayesian Phase Transitions in a Toy Model of Superposition

This paper presents an in-depth paper of phase transitions in a Toy Model of Superposition (TMS) with two hidden dimensions under the framework of Singular Learning Theory (SLT). The authors derive theoretical insights from the population loss and provide empirical support for their findings, exploring both Bayesian and dynamical phase transitions. This exploration encompasses both theoretical derivation of critical points and empirical verification using MCMC sampling and SGD trajectories.

Theoretical Framework and Methodology

The authors employ SLT to analyze the Bayesian phase transitions in the TMS model. Using SLT, they derive a closed-form for the TMS potential in the high sparsity limit. The model's simplicity facilitates a detailed examination of its phase structure, unveiling the criticality of regular kk-gons.

Key results outline the local learning coefficients of various kk-gons. The paper extends previous works by showing these coefficients define phase transitions in the Bayesian posterior as the training sample size grows. Empirical evidence confirms that kk-gon critical points also govern the behavior of SGD training.

Empirical Verification

The robust empirical methodology includes MCMC sampling of the posterior and classification of critical phases. Their analysis reveals a strong alignment between theoretical predictions and empirical observations. Central to this is the transition between the $5$-gon and $6$-gon phases, occurring at around 600 samples. These transitions, identified through theoretical local learning coefficients, emphasize the principled nature of phase boundaries.

Figures, such as the t-SNE visualizations, substantiate that the posterior samples indeed cluster around the predicted critical points. These clusters align with the theoretical occupancy predictions, even when tested against the temporal evolution of MCMC chains vis-à-vis theoretical loss values.

Dynamical Phase Transitions

The discussion expands to dynamical transitions during SGD training. The findings suggest that SGD trajectories plateau at loss levels correlated with critical points of increasing complexity. Authors argue that these transitions reflect the underlying Bayesian phase transitions. This is substantiated by observing consistent patterns where each loss plateau corresponds to decreasing loss and increasing complexity.

Figures exhibiting internal plots of weight vectors at various strides of training, combined with smooth transitions in learning coefficients, graphically illustrate dynamical transitions such as 4++4+55+4^{++-} \rightarrow 4^{+} \rightarrow 5 \rightarrow 5^{+}.

Implications and Future Directions

The paper's results enhance the understanding of SGD and Bayesian learning processes through the lens of phase transitions. Identification of kk-gons as critical structures reveals deeper geometrical and inferential properties underlying neural network training and offers new instrumentation for studying model complexity.

Moreover, the findings have far-reaching implications for model interpretability, AI safety, and generalization in neural networks. The methodology introduced—particularly the use of local complexity measures—provides a novel diagnostic tool for measuring model complexity, which could be applied to broader neural network architectures beyond TMS.

Conclusion

This paper provides a mathematically rigorous and empirically validated paper of phase transitions in TMS, revealing critical structures that govern both Bayesian and dynamical learning processes. It presents a compelling connection between theoretical principles and practical learning dynamics, offering tools and insights that extend beyond the specific model to potentially broader applications in AI research. The introduction of local learning coefficients as a measure of complexity stands as a significant contribution to understanding and leveraging the complexity in learning models.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com