Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 38 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 420 tok/s Pro
Claude Sonnet 4.5 30 tok/s Pro
2000 character limit reached

Is Grokking a Computational Glass Relaxation? (2505.11411v1)

Published 16 May 2025 in cs.LG and cond-mat.dis-nn

Abstract: Understanding neural network's (NN) generalizability remains a central question in deep learning research. The special phenomenon of grokking, where NNs abruptly generalize long after the training performance reaches a near-perfect level, offers a unique window to investigate the underlying mechanisms of NNs' generalizability. Here we propose an interpretation for grokking by framing it as a computational glass relaxation: viewing NNs as a physical system where parameters are the degrees of freedom and train loss is the system energy, we find memorization process resembles a rapid cooling of liquid into non-equilibrium glassy state at low temperature and the later generalization is like a slow relaxation towards a more stable configuration. This mapping enables us to sample NNs' Boltzmann entropy (states of density) landscape as a function of training loss and test accuracy. Our experiments in transformers on arithmetic tasks suggests that there is NO entropy barrier in the memorization-to-generalization transition of grokking, challenging previous theory that defines grokking as a first-order phase transition. We identify a high-entropy advantage under grokking, an extension of prior work linking entropy to generalizability but much more significant. Inspired by grokking's far-from-equilibrium nature, we develop a toy optimizer WanD based on Wang-landau molecular dynamics, which can eliminate grokking without any constraints and find high-norm generalizing solutions. This provides strictly-defined counterexamples to theory attributing grokking solely to weight norm evolution towards the Goldilocks zone and also suggests new potential ways for optimizer design.

Summary

Insights on the Computational Interpretation of Grokking

The paper's central focus is on exploring the phenomenon known as "grokking" in neural networks, particularly in modular arithmetic tasks. Grokking refers to the delayed sudden generalization of neural networks long after achieving near-perfect training accuracy. The authors of this paper propose a novel interpretation of grokking through the lens of computational glass relaxation. They draw intriguing parallels between neural networks and physical systems, using concepts from statistical mechanics and glass physics to interpret neural networks' behavior during grokking.

The authors argue that grokking may not be a first-order phase transition, as previously suggested by Rubin et al., positing instead that it resembles the glassy relaxation process. This perspective stems from the observation that there is no entropy barrier between memorization and generalization states, challenging the notion of grokking as merely a transition across a free energy barrier. By mapping neural networks to physical systems, they're able to explore the entropy landscape as a function of training loss and test accuracy, discovering that grokking is more akin to a slow relaxation towards a high-entropy, stable state rather than a sharp phase transition.

The research identifies a significant high-entropy advantage under grokking, suggesting that states with higher entropy correlate with better generalization capabilities. This observation builds on previous work linking entropy to generalizability, proposing a much more profound significance in the context of grokking challenges. The paper further corroborates this insight by showing that networks constrained to fixed weight norms can eliminate grokking but still exhibit this entropy advantage.

In terms of numerical results, the paper confirms that reducing or eliminating grokking does not necessarily degrade generalization performance. Using an optimizer inspired by glass relaxation dynamics, termed WanD, the authors demonstrate that high-norm solutions can be found, challenging theories that attribute grokking phenomena solely to weight norm evolution towards a specific range ("Goldilocks zone"). This new optimizer can achieve equivalent generalization efficiencies to conventional methods and hints at promising new avenues for optimizer design.

The implications of this research are multifaceted. Practically, it shows potential for designing optimizers that exploit higher-entropy configurations to enhance generalization without the grokking lag. Theoretically, it opens up discussions about the physical analogy of neural networks and the application of statistical mechanics concepts in understanding learning dynamics. Future research might explore this analogy or explore other statistical-physical methodologies for developing better learning algorithms.

In summary, the authors assert that grokking is essentially a computational glass relaxation process. This interpretation provides fresh insight into understanding neural network behavior, bringing a novel perspective on generalization properties and potentially guiding new methodologies in the field of AI.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube