Impact of the training trajectory on diffusion model quality

Determine whether the entire training trajectory—rather than only the final training/validation loss—affects the final generative quality of the U-Net denoising diffusion probabilistic model (DDPM) trained to learn the score function of Navier–Stokes Kolmogorov-flow trajectories, and characterize which aspects of the training dynamics (e.g., learning-rate annealing) are responsible for this effect.

Background

Across experiments, the authors observe cases where similar or even better loss values do not correspond to comparable generative quality, particularly for schedule-free AdamW and for the wsd learning-rate schedule. This suggests that final loss alone may not predict sample quality in their diffusion training setup.

Motivated by these findings, the authors posit that the path of optimization, including scheduling and cooldown, could be crucial for the eventual quality of the learned diffusion model, and explicitly flag this as an open direction for future work.

References

We conjecture that the entire training trajectory might impact the final model quality, and leave this open for future work.

— Optimization Benchmark for Diffusion Models on Dynamical Systems (2510.19376 - Schaipp, 22 Oct 2025) in Conclusion (Section 4)

Impact of the training trajectory on diffusion model quality

Background

References

Related Problems