Timescale separation in higher-degree homogeneous activations
Determine whether two-layer networks in which the unit activation function is a homogeneous polynomial of degree p > 2 in the unit weights exhibit a timescale separation between units under gradient flow from small initialization, and ascertain whether this separation is stronger than in the quadratic (p = 2) case.
Sponsor
References
If +(x; u) is a homogeneous polynomial of degree p > 2 in the weights u, we conjecture that there is still a timescale separation between units, possibly even stronger than the quadratic case.
— Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures
(2512.20607 - Zhang et al., 23 Dec 2025) in Section 5.2 (Quadratic case: timescale separation between units) — Higher-order polynomial activation