Incorporate weight decay (e.g., AdamW) into the implicit-bias analysis
Characterize how weight decay or decoupled decay terms—such as the W-term in AdamW—modify the effective sharpness regularizer within the slow SDE/ODE framework for Adam-like methods.
Sponsor
References
Despite these advances, several important avenues remain open. Finally, our approach cannot cover weightâdecay or decoupled decay terms such as the $W$-term in AdamW; characterizing how weight decay alters the effective sharpness regularizer is an important direction for follow-on studies.
— Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
(2511.02773 - Li et al., 4 Nov 2025) in Conclusions