Overview of Latent Consistency Models
Latent Diffusion Models (LDMs), such as Stable Diffusion, have shown remarkable capabilities in generating high-resolution images based on textual descriptions. Nevertheless, their iterative reverse sampling process tends to be slow, which is not ideal for real-time applications. Latent Consistency Models (LCMs) present an innovative approach to fast, high-resolution image generation by reducing the number of required sampling steps significantly.
Distillation for Few-step Inference
LCMs operate by performing a one-stage guided distillation process that solves an augmented probability flow ODE (PF-ODE) directly in latent space. This novel method allows LCMs to predict high-fidelity sample outcomes in just a few steps—or even in a single step—from pre-trained LDMs. An efficient training regimen enables a quality 768-resolution LCM to be trained using only 32 A100 GPU hours, and the proposed Skipping-Step technique further accelerates convergence during the distillation process.
Fine-tuning on Custom Datasets
The paper also introduces Latent Consistency Fine-tuning (LCF), which enables a pre-trained LCM to be adapted efficiently to customized image datasets, maintaining the model's rapid inference capability. LCF demonstrates practical utility for downstream tasks, where LCMs must be tailored to specific styles or content without the need for a teacher diffusion model trained specifically on the new dataset.
Evaluation Results
Evaluation on the LAION-5B-Aesthetics dataset confirmed that LCMs achieve state-of-the-art text-to-image generation with reduced inference steps. Notably, LCMs outperform other methods, including baselines in the DDIM series and Guided-Distill, particularly in low-step inference scenarios, maintaining compelling balance between image quality and generation speed.
Conclusion and Future Work
In summary, LCMs emerge as a promising solution for fast and high-quality image generation from text. They inherit the strengths of diffusion-based generative models while shedding the limitations of lengthy iterative processes. Prospects for future research include expanding LCM applications to additional image synthesis tasks like editing, inpainting, and super-resolution, broadening the model's utility in real-world scenarios.