Incorporate weight decay (e.g., AdamW) into the implicit-bias analysis

Characterize how weight decay or decoupled decay terms—such as the W-term in AdamW—modify the effective sharpness regularizer within the slow SDE/ODE framework for Adam-like methods.

Background

The presented analysis does not cover weight decay or decoupled decay mechanisms commonly used in adaptive methods such as AdamW. These terms can substantially alter optimization trajectories and, potentially, the resulting implicit regularization.

The authors identify the need to extend the framework to include weight-decay effects and to determine how such terms change the effective sharpness objective that Adam implicitly minimizes.

References

Despite these advances, several important avenues remain open. Finally, our approach cannot cover weight‐decay or decoupled decay terms such as the $W$-term in AdamW; characterizing how weight decay alters the effective sharpness regularizer is an important direction for follow-on studies.