Extend slow-SDE analysis beyond the 2-scheme scaling of β2

Extend the slow SDE/ODE analysis and implicit-bias characterization for Adam and the broader AGM framework from the 2-scheme regime (where 1−β2=Θ(η^2)) to the intermediate 1.5-scheme and other scalings of 1−β2, deriving the correct continuous-time limit and identifying the associated sharpness regularizer that these methods implicitly minimize.

Background

The paper develops a slow SDE-based analysis for Adam and related adaptive gradient methods under the "2-scheme" scaling, where the second-moment decay parameter satisfies 1−β2=Θ(η2). This scaling is chosen to make the preconditioner evolve on the same slow timescale as the manifold dynamics, enabling a rigorous approximation and an implicit-bias characterization.

The authors explicitly note that extending this analysis beyond the 2-scheme—particularly to the intermediate 1.5-scheme or other scalings of 1−β2—remains unresolved and is left for future work. Such extensions would test the robustness of the current theory and potentially reveal qualitatively different implicit regularizers.

References

Despite these advances, several important avenues remain open. First, we have focused on the “2-scheme” regime (where $1-\beta_2=O(\eta2)$) in order to track Adam’s preconditioner over a long timescale; extending our analysis to the intermediate 1.5-scheme or other scalings of $1-\beta_2$ is left for future work.