Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling (2305.10769v5)

Published 18 May 2023 in cs.LG and cs.CV

Abstract: Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the possibility of real-time sample synthesis. Traditional accelerated sampling algorithms via knowledge distillation rely on pre-trained model weights and discrete time step scenarios, necessitating additional training sessions to achieve their goals. To address these issues, we propose the Catch-Up Distillation (CUD), which encourages the current moment output of the velocity estimation model ``catch up'' with its previous moment output. Specifically, CUD adjusts the original Ordinary Differential Equation (ODE) training objective to align the current moment output with both the ground truth label and the previous moment output, utilizing Runge-Kutta-based multi-step alignment distillation for precise ODE estimation while preventing asynchronous updates. Furthermore, we investigate the design space for CUDs under continuous time-step scenarios and analyze how to determine the suitable strategies. To demonstrate CUD's effectiveness, we conduct thorough ablation and comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, we obtain a FID of 2.80 by sampling in 15 steps under one-session training and the new state-of-the-art FID of 3.37 by sampling in one step with additional training. This latter result necessitated only 620k iterations with a batch size of 128, in contrast to Consistency Distillation, which demanded 2100k iterations with a larger batch size of 256. Our code is released at https://anonymous.4open.science/r/Catch-Up-Distillation-E31F.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Conference on Learning Representations, Virtual Event, Apr. 2022. OpenReview.net.
  2. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations, New Orleans, LA, USA, May 2019. OpenReview.net.
  3. Hierarchical self-supervised augmented knowledge distillation. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, pages 1217–1223, Virtual Event, Aug. 2021. IJCAI.
  4. P. Dhariwal and A. Q. Nichol. Diffusion models beat GANs on image synthesis. In Neural Information Processing Systems, volume 34, pages 8780–8794, Virtual Event, Dec. 2021. NIPS. URL https://openreview.net/forum?id=AAWuCvzaVt.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, Event Virtual, May 2020. OpenReview.net.
  6. Sharpness-aware training for free. In Advances in Neural Information Processing Systems, volume 35, pages 23439–23451, New Orleans, Louisiana, USA, Dec. 2022. NIPS.
  7. Autogan: Neural architecture search for generative adversarial networks. In International Conference on Computer Vision, pages 3224–3234, Seoul, South Korea, Oct.-Nov. 2019. IEEE.
  8. Generative adversarial nets. In Neural Information Processing Systems, volume 27, Long Beach, CA, USA, Jan. 2014. NIPS.
  9. Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819, 2021.
  10. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition, pages 770–778, Las Vegas, NV, USA, Jun. 2016a. IEEE.
  11. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645, Amsterdam, North Holland, The Netherlands, Oct. 2016b. Springer.
  12. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Neural Information Processing Systems, volume 30, Long Beach Convention Center, Long Beach, Dec. 2017. NIPS.
  13. Distilling the knowledge in a neural network, 2015. URL https://arxiv.org/abs/1503.02531.
  14. J. Ho and T. Salimans. Classifier-free diffusion guidance. In Neural Information Processing Systems Workshop, Virtual Event, Dec. 2021. NIPS. URL https://openreview.net/forum?id=qw8AKxfYbI.
  15. Denoising diffusion probabilistic models. In Neural Information Processing Systems, pages 6840–6851, Virtual Event, Dec. 2020. NIPS.
  16. A style-based generator architecture for generative adversarial networks. In Computer Vision and Pattern Recognition, pages 4401–4410, Seoul, South Korea, Oct.-Nov. 2019.
  17. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
  18. Snips: Solving noisy inverse problems stochastically. In Neural Information Processing Systems, volume 34, pages 21757–21769, Virtual Event, Dec. 2021. NIPS.
  19. Variational diffusion models. In Neural Information Processing Systems, volume 34, pages 21696–21707, Virtual Event, 2021. NIPS.
  20. D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  21. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, Virtual Event, May 2021. OpenReview.net. URL https://openreview.net/forum?id=a-xFK8Ymz5J.
  22. Learning multiple layers of features from tiny images. 2009.
  23. Minimizing trajectory curvature of ode-based generative models. arXiv preprint arXiv:2301.12003, 2023.
  24. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022. ISSN 0925-2312. doi: https://doi.org/10.1016/j.neucom.2022.01.029.
  25. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022.
  26. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022a.
  27. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In Neural Information Processing Systems, New Orleans, LA, USA, Nov.-Dec. 2022b. NIPS.
  28. E. Luhman. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  29. On distillation of guided diffusion models. arXiv preprint arXiv:2210.03142, 2022.
  30. Improved knowledge distillation via teacher assistant. In Association for the Advance of Artificial Intelligence, volume 34, pages 5191–5198, New York, NY, USA, Feb. 2020. AAAI Press.
  31. A. Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  32. Relational knowledge distillation. In Computer Vision and Pattern Recognition, Long Beach, CA, USA, June 2019. IEEE.
  33. Dreamfusion: Text-to-3d using 2d diffusion. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=FjNys5c7VyY.
  34. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, pages 234–241, Germany, Central Europe, Oct. 2015. Springer.
  35. C. Runge. Ueber die numerische auflösung von differentialgleichungen. pages 1432–1807, Jun. 1895.
  36. T. Salimans and J. Ho. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, Virtual Event, Apr. 2022. OpenReview.net.
  37. Improved techniques for training gans. In Neural Information Processing Systems, volume 29, Centre Convencions Internacional Barcelona, Barcelona SPAIN, Dec. 2016. NIPS.
  38. Denoising diffusion implicit models. In International Conference on Learning Representations, kigali, rwanda, May. 2023a. OpenReview.net.
  39. Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. In Neural Information Processing Systems, volume 32. NIPS, 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf.
  40. Consistency models. arXiv preprint arXiv:2303.01469, 2023b.
  41. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, kigali, rwanda, May. 2023c. OpenReview.net.
  42. Accelerating diffusion sampling with classifier-based feature distillation. arXiv preprint arXiv:2211.12039, 2022.
  43. A. Vahdat and J. Kautz. Nvae: A deep hierarchical variational autoencoder. volume 33, pages 19667–19679, Virtual Event, Dec. 2020. NIPS.
  44. Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796, 2022.
  45. Q. Zhang and Y. Chen. Fast sampling of diffusion models with exponential integrator. In International Conference on Learning Representations. OpenReview.net, 2023. URL https://openreview.net/forum?id=Loek7hfb46P.
  46. The unreasonable effectiveness of deep features as a perceptual metric. In Computer Vision and Pattern Recognition, Salt Lake City, Utah, USA, June 2018. IEEE.
  47. Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision, pages 2223–2232. IEEE, 2017.
Citations (14)

Summary

We haven't generated a summary for this paper yet.