Improved Training Technique for Latent Consistency Models (2502.01441v2)

Published 3 Feb 2025 in cs.CV and cs.LG

Abstract: Consistency models are a new family of generative models capable of producing high-quality samples in either a single step or multiple steps. Recently, consistency models have demonstrated impressive performance, achieving results on par with diffusion models in the pixel space. However, the success of scaling consistency training to large-scale datasets, particularly for text-to-image and video generation tasks, is determined by performance in the latent space. In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. To address this, we replace Pseudo-Huber losses with Cauchy losses, effectively mitigating the impact of outliers. Additionally, we introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance. Lastly, we introduce the adaptive scaling-$c$ scheduler to manage the robust training process and adopt Non-scaling LayerNorm in the architecture to better capture the statistics of the features and reduce outlier impact. With these strategies, we successfully train latent consistency models capable of high-quality sampling with one or two steps, significantly narrowing the performance gap between latent consistency and diffusion models. The implementation is released here: https://github.com/quandao10/sLCT/

Summary

The paper introduces a novel technique that applies a Cauchy loss function to reduce the impact of impulsive outliers in latent spaces.
It integrates diffusion loss at early timesteps and optimal transport coupling to stabilize training and minimize error accumulation.
Empirical results demonstrate improved performance with high-fidelity outputs and computational efficiency on large-scale image datasets.

Improved Training Technique for Latent Consistency Models

The paper "Improved Training Technique for Latent Consistency Models" presents an advancement in the methodology for training consistency models within latent spaces, addressing significant challenges posed by impulsive outliers found in latent datasets compared to pixel-based training. This work improves upon the established framework of consistency models, which originally showed promise for generating high-quality samples with notable computational efficiency but primarily in pixel space. The authors introduce several key modifications that appear to facilitate training stability and performance enhancement when extending to latent spaces, particularly relevant in large-scale applications such as text-to-image and video generation tasks.

Core Contributions and Methodologies

The authors identify that in transitioning from pixel to latent space, data often contains impulsive outliers that adversely affect performance. To mitigate this issue, the following strategies are proposed:

Cauchy Loss Function: This substitute for Pseudo-Huber loss effectively reduces the influence of outliers, stabilizing the training process in the presence of extreme values. While Pseudo-Huber is robust to an extent, Cauchy provides significant damping of the effect of these outlier data points, allowing better convergence and enhancing the model's ability to generate high-quality samples.
Diffusion Loss at Early Timesteps: By integrating diffusion targets at small noise levels, the authors regularize the training process and minimize the temporal difference error accumulation, providing a beneficial bias towards approximating the correct data distribution as an initial condition.
Optimal Transport (OT) Coupling: This technique is employed during minibatch training to optimally align noise and data pairings, effectively reducing variance and thereby increasing consistency in model training. The variance reduction presumably enhances the efficiency of the fitting process, resulting in better generalization.
Adaptive Scaling- $c$ Scheduler: The scheduler dynamically adjusts the scale parameter $c$ used in robust loss calculations. By aligning with an exponential curriculum for step discretization, this approach fine-tunes robustness control as training progresses, crucial for capitalizing on consistency training within complex latent spaces.
Non-scaling LayerNorm (NsLN): The adoption of NsLN reduces sensitivity to outliers by removing the scaling factor from LayerNorm. This adjustment improves feature statistics capture, contributing significantly to the robust performance of the model when working in latent spaces.

Results and Implications

Empirical evaluations across high-resolution image datasets—CelebA-HQ, LSUN Church, and FFHQ—demonstrate the technique's capability to bridge performance gaps between consistency and diffusion models in latent spaces. Notably, the results show that the proposed model achieves favorable FID (Fréchet Inception Distance) and Recall scores using only one or two denoising steps. These findings suggest that the modified consistency training framework provides a viable path for efficiently scaling generative models to large, complex datasets, effectively mitigating the computational expenses of multi-step diffusion model sampling.

The implications of this paper are substantial for the development of generative models capable of high-fidelity outputs with limited computational overhead. The advancements in training techniques emphasize the potential for leveraging consistency models in real-world applications where speed and efficiency are pivotal, such as real-time video generation or interactive media content creation.

Future Directions

While the paper effectively addresses critical challenges associated with latent consistency modeling, further exploration could extend into areas such as architectural innovations or enhanced normalization schemes that inherently counteract impulsive noise effects. Additionally, integrating the latent space techniques with other state-of-the-art consistency models like the Consistency Trajectory Models (CTM) may yield even more efficient and robust outcomes.

In summary, this research contributes meaningful advancements in training latent consistency models, paving the way for their broader application and underpinning vital developments in generative AI. The methodologies proposed are not only technically sound but also strategically aligned with enhancing model reliability and performance across diverse application scenarios.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Find Related Papers

Authors (5)

GitHub

GitHub - quandao10/sLCT: Stable Latent Consistency Training