Role of learning-rate annealing in schedule-free AdamW’s generative quality
Investigate whether the inferior generative quality observed when training the U-Net DDPM with schedule-free AdamW (constant learning rate after warmup) is caused by the absence of learning-rate annealing, and determine whether adding a linear cooldown consistently restores generative quality across hyperparameter configurations for this diffusion training task.
References
We conjecture that this is partially due to the missing learning-rate annealing: adding a linear cooldown to improves generative quality, at least for some hyperparameter configurations.
— Optimization Benchmark for Diffusion Models on Dynamical Systems
(2510.19376 - Schaipp, 22 Oct 2025) in Section 3 (Results), paragraph “Mismatch of loss value and generative quality for schedule-free AdamW”