Mathematical framework for feature emergence linked to gradient dynamics
Establish a mathematical framework that precisely characterizes which features emerge during training, how they emerge, and under which conditions they do so, while remaining closely connected to the gradient dynamics of learning for models trained on complex structured inputs.
References
While the phenomenon of grokking, i.e., delayed generalization, has been studied extensively, it remains an open problem whether there is a mathematical framework that characterizes what kind of features will emerge, how and in which conditions it happens, and is still closely connected with the gradient dynamics of the training, for complex structured inputs.
— Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking
(2509.21519 - Tian, 25 Sep 2025) in Abstract