Distance‑to‑manifold criterion for feature learning strength

Establish whether, in architectures that exhibit saddle-to-saddle dynamics, the Euclidean distance from the initial weights to invariant manifolds associated with low effective width determines the strength of feature learning, including the prominence of plateaus and the trajectory’s proximity to saddles.

Background

Section 6 investigates how initialization scale modulates plateau duration and the closeness of trajectories to saddles. The authors conjecture that a geometric distance-to-manifold criterion governs the strength of feature learning, refining prior heuristics based on layer weight scales or initial rank.

Proving this would yield a practical diagnostic for tuning initialization to elicit desired richness in learned representations.

References

In architectures that have saddle-to-saddle dynamics, we conjecture that the distance from the initial weights to invariant manifolds associated with low effective width determines the strength of feature learning.

Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures (2512.20607 - Zhang et al., 23 Dec 2025) in Section 6 — Effect of initialization scale