- The paper introduces LCSC, a novel method that linearly combines intermediate checkpoints to enhance diffusion and consistency model performance.
- It achieves over 14-fold acceleration in training and improved FID scores by optimizing combination coefficients in a low-dimensional search space.
- The study reveals unexplored weight space structures, offering fresh insights for efficient generative model training and optimization.
Enhancing Generative Models through Linear Combination of Checkpoints: A Study on Consistency and Diffusion Models
Introduction
Generative modeling has witnessed significant advancements with the advent of Diffusion Models (DMs) and Consistency Models (CMs), both demonstrating compelling performance across a variety of tasks. A common practice in training these models is the utilization of the last converged weight checkpoint for generation tasks. However, this approach overlooks the wealth of valuable information embedded in intermediate checkpoints. This paper investigates a novel methodology, termed as Linear Combination of Saved Checkpoints (LCSC), aimed at exploiting these intermediate checkpoints to either expediently reach or surpass the generative quality of fully trained models.
Observations and Motivations
The investigation into the training dynamics of DMs and CMs reveals that the trajectory traversed in the weight space contains numerous potential checkpoints that, if appropriately combined, could lead to superior model performance unreachable by traditional optimization routes like Stochastic Gradient Descent (SGD) and its variants. Additionally, despite the prevalent application of Exponential Moving Average (EMA) for stabilizing training, our findings suggest its sub-optimality, thereby presenting an opportunity for improvement.
LCSC: Methodology
LCSC proposes an optimization framework that operates in a low-dimensional search space, aiming at optimizing a small number of combination coefficients of selected checkpoints. This approach is tailored to enhance the generative quality of models as measured by established metrics such as Frechet Inception Distance (FID). By employing evolutionary search to determine these coefficients, LCSC circumvents the limitations of gradient-based methods, particularly for objectives that are non-differentiable or computationally intensive. This method proves effective in both reducing the computational demands of training robust models and enhancing the performance of fully trained models.
Experimental Validations
A comprehensive set of experiments across two primary use cases—namely, reducing training costs and enhancing pre-trained models—demonstrates LCSC's efficacy. Notably, for consistency models trained with CD on CIFAR-10, LCSC achieves an FID score that significantly surpasses the base models with considerably fewer training iterations, exemplifying an over 14-fold acceleration in training speed. Furthermore, when applied to pre-trained DMs and CMs, LCSC consistently improves sample quality or speeds up the generation process, showcasing its potential in refining the output capabilities of these models.
Theoretical Implications and Future Directions
The results obtained from LCSC suggest that the weight space of DMs and CMs contains rich structures and basins of optimal performance that are not readily accessible through conventional training methods. The ability of LCSC to locate these basins by leveraging the linear combinations of checkpoints opens new avenues for understanding and exploiting the training dynamics of generative models. Future work may explore the extension of LCSC's approach to other forms of generative models and neural networks, further advancing our capabilities in efficient and effective model training and optimization.
Conclusion
This paper introduces a promising technique, LCSC, which by harnessing the power of intermediate weight checkpoints, can significantly enhance the performance of generative models, notably DMs and CMs. The method offers a novel perspective on optimizing generative model performance, providing both practical benefits in terms of computational efficiency and theoretical insights into the landscape of model weights. Its utility in both accelerating model training and enhancing pre-trained models, as demonstrated through rigorous experimentation, heralds LCSC as a valuable contribution to the field of generative modeling.