Mathematical framework for feature emergence linked to gradient dynamics

Establish a mathematical framework that precisely characterizes which features emerge during training, how they emerge, and under which conditions they do so, while remaining closely connected to the gradient dynamics of learning for models trained on complex structured inputs.

Background

The paper studies grokking—delayed generalization—in neural networks and seeks a first-principles explanation of how features emerge during training. Despite extensive empirical paper of grokking, the authors note that a rigorous framework connecting feature emergence to the gradient dynamics, especially for structured inputs (e.g., group arithmetic), has been lacking.

This open problem motivates the development of the Li framework proposed in the paper, which partitions learning dynamics into lazy, independent, and interactive stages and introduces an energy function governing feature emergence. The problem, as stated, asks for a general mathematical characterization that remains tightly aligned with gradient dynamics.

References

While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open problem whether there is a mathematical framework that characterizes what kind of features will emerge, how and in which conditions it happens, and is still closely connected with the gradient dynamics of the training, for complex structured inputs.

— Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking (2509.21519 - Tian, 25 Sep 2025) in Abstract

Mathematical framework for feature emergence linked to gradient dynamics

Background

References

Related Problems