Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multistep Consistency Models (2403.06807v3)

Published 11 Mar 2024 in cs.LG, cs.CV, and stat.ML

Abstract: Diffusion models are relatively easy to train but require many steps to generate samples. Consistency models are far more difficult to train, but generate samples in a single step. In this paper we propose Multistep Consistency Models: A unification between Consistency Models (Song et al., 2023) and TRACT (Berthelot et al., 2023) that can interpolate between a consistency model and a diffusion model: a trade-off between sampling speed and sampling quality. Specifically, a 1-step consistency model is a conventional consistency model whereas a $\infty$-step consistency model is a diffusion model. Multistep Consistency Models work really well in practice. By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples, while retaining much of the sampling speed benefits. Notable results are 1.4 FID on Imagenet 64 in 8 step and 2.1 FID on Imagenet128 in 8 steps with consistency distillation, using simple losses without adversarial training. We also show that our method scales to a text-to-image diffusion model, generating samples that are close to the quality of the original model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. TRACT: denoising diffusion models with transitive closure time-distillation. CoRR, abs/2303.04248, 2023.
  2. Classifier-free diffusion guidance. CoRR, abs/2207.12598, 2022. doi: 10.48550/arXiv.2207.12598. URL https://doi.org/10.48550/arXiv.2207.12598.
  3. Denoising diffusion probabilistic models. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS, 2020.
  4. Gotta go fast when generating data with score-based models. CoRR, abs/2105.14080, 2021.
  5. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS, 2022.
  6. Understanding the diffusion objective as a weighted integral of elbos. CoRR, abs/2303.00848, 2023.
  7. Variational diffusion models. CoRR, abs/2107.00630, 2021.
  8. DiffWave: A versatile diffusion model for audio synthesis. In 9th International Conference on Learning Representations, ICLR, 2021.
  9. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda. OpenReview.net, 2023.
  10. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=XVjTT1nw5z.
  11. Power hungry processing: Watts driving the cost of ai deployment? arXiv preprint arXiv:2311.16863, 2023.
  12. Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models. CoRR, abs/2305.18455, 2023.
  13. On distillation of guided diffusion models. CoRR, abs/2210.03142, 2022.
  14. Photorealistic text-to-image diffusion models with deep language understanding. CoRR, abs/2205.11487, 2022.
  15. Progressive distillation for fast sampling of diffusion models. In The Tenth International Conference on Learning Representations, ICLR. OpenReview.net, 2022.
  16. Deep unsupervised learning using nonequilibrium thermodynamics. In Bach, F. R. and Blei, D. M. (eds.), Proceedings of the 32nd International Conference on Machine Learning, ICML, 2015.
  17. Denoising diffusion implicit models. In 9th International Conference on Learning Representations, ICLR, 2021a.
  18. Improved techniques for training consistency models. CoRR, abs/2310.14189, 2023.
  19. Score-based generative modeling through stochastic differential equations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021b.
  20. Consistency models. In International Conference on Machine Learning, ICML, 2023.
  21. Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, ICML, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jonathan Heek (13 papers)
  2. Emiel Hoogeboom (26 papers)
  3. Tim Salimans (46 papers)
Citations (22)

Summary

Unifying Consistency Models and TRACT for Efficient Diffusion Model Sampling

Introduction to Multistep Consistency Models

In recent developments within the field of generative modeling, particularly concerning diffusion models, the trade-off between sampling efficiency and output quality has been a focal point of research. Traditional diffusion models, despite their efficacy in generating high-quality samples, suffer from the drawback of necessitating numerous iterative steps to produce outputs, thereby increasing computational costs and sampling times. On the other hand, Consistency Models, as introduced by Song et al., aim to mitigate this inefficiency by proposing a model capable of generating samples in a single iteration. However, this approach often comes at the expense of sample quality.

In this context, we introduce Multistep Consistency Models, a novel methodology that effectively bridges the gap between the traditional multi-step diffusion models and the single-step Consistency Models. Our proposed model allows for a flexible middle ground by enabling sample generation in multiple steps, providing a customizable balance between quality and efficiency.

Key Contributions

Our research presents several key contributions to the domain of generative modeling and LLMs:

  • We propose a novel framework termed Multistep Consistency Models, which unifies the concepts underlying Consistency Models and TRACT. This framework enables interpolation between the traditional diffusion model and single-step consistency models, allowing users to choose an optimal point in terms of sampling speed versus quality.
  • Through extensive experimentation, particularly on challenging datasets such as ImageNet, we demonstrate that by increasing the steps from one to a modest range (2-8 steps), we can significantly enhance sample quality while retaining the benefits of reduced sampling time. Remarkably, we attain competitive FID scores on par with baseline diffusion models in as few as 8 steps.
  • A critical aspect of our methodology is the introduction of the Adjusted DDIM (aDDIM) sampler, a deterministic sampling technique that mitigates the integration errors inherent in traditional deterministic samplers like DDIM, effectively reducing sample blurriness and improving fidelity.
  • Theoretical discussions within our paper illustrate that as the number of steps in Multistep Consistency Training increases, the model increasingly resembles a standard diffusion model, thereby reinforcing the intuition behind our approach.
  • Our research underscores the importance of step schedule annealing and synchronized dropout, which were pivotal in training models that not only achieve higher quality samples but also facilitate an easier training process.

Implications and Speculations on Future Developments

The introduction of Multistep Consistency Models heralds a significant advancement in the field of generative AI and diffusion models. By offering a flexible framework that interpolates between speed and quality, our methodology presents a compelling solution to one of the primary bottlenecks in diffusion model sampling. This balance is particularly relevant in scenarios requiring rapid sample generation without substantially compromising on output quality.

Looking forward, the versatility of Multistep Consistency Models opens avenues for deeper exploration into efficient training strategies, the further evolution of deterministic samplers, and the potential integration of these concepts into broader applications beyond image generation, including video and audio synthesis.

Moreover, our findings invite further investigation into the theoretical underpinnings of consistency models and diffusion processes, potentially paving the way for novel generative models that transcend the limitations of current methodologies.

In summary, Multistep Consistency Models represent a pivotal step toward the refinement of diffusion-based generative models, promising not only enhanced efficiency and sample quality but also inspiring future innovations in generative AI research.