Role of learning-rate annealing in schedule-free AdamW’s generative quality

Investigate whether the inferior generative quality observed when training the U-Net DDPM with schedule-free AdamW (constant learning rate after warmup) is caused by the absence of learning-rate annealing, and determine whether adding a linear cooldown consistently restores generative quality across hyperparameter configurations for this diffusion training task.

Background

In experiments, schedule-free AdamW nearly matches AdamW’s loss without scheduling but yields inferior generative quality of sampled trajectories. This mismatch persists across seeds and tuning.

The authors hypothesize that the lack of learning-rate annealing is responsible and show that introducing a cooldown can improve sample quality in some configurations, leaving a conjecture to be validated systematically.

References

We conjecture that this is partially due to the missing learning-rate annealing: adding a linear cooldown to improves generative quality, at least for some hyperparameter configurations.

— Optimization Benchmark for Diffusion Models on Dynamical Systems (2510.19376 - Schaipp, 22 Oct 2025) in Section 3 (Results), paragraph “Mismatch of loss value and generative quality for schedule-free AdamW”

Role of learning-rate annealing in schedule-free AdamW’s generative quality

Background

References

Related Problems