Extend the BEoS generalization framework beyond two-layer ReLU networks

Extend the theoretical framework that analyzes gradient-descent solutions below the Edge-of-Stability via data-dependent weighted path/variation norms for two-layer fully-connected ReLU networks to deeper networks or architectures with specific inductive biases, such as convolutional neural networks.

Background

The paper establishes generalization guarantees and phenomena for overparameterized two-layer fully-connected ReLU networks trained below the Edge of Stability (BEoS), using a function-space characterization via data-dependent weighted path/variation norms. These results connect data geometry (e.g., low-dimensional mixtures and isotropic radial profiles) to implicit regularization and generalization behavior of stable solutions reached by gradient descent.

While the approach is width-agnostic and provides both upper and lower bounds in the two-layer setting, the authors note that extending this analysis to deeper architectures or those with strong inductive biases (like CNNs) is nontrivial. They explicitly state that this extension is left for future work, highlighting a concrete gap in current theory regarding BEoS-induced implicit bias and generalization in more complex network architectures.

References

Our theoretical results are derived for two-layer fully-connected ReLU networks, a cornerstone for theoretical analysis. Extending this framework to deeper networks or architectures with specific inductive biases (e.g., CNNs) is a significant undertaking that we leave for future work.

— Generalization Below the Edge of Stability: The Role of Data Geometry (2510.18120 - Liang et al., 20 Oct 2025) in Introduction – Scope of Analysis

Extend the BEoS generalization framework beyond two-layer ReLU networks

Background

References

Related Problems