Improved Techniques for Training Score-Based Generative Models (2006.09011v2)

Published 16 Jun 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Score-based generative models can produce high quality image samples comparable to GANs, without requiring adversarial optimization. However, existing training procedures are limited to images of low resolution (typically below 32x32), and can be unstable under some settings. We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces, explaining existing failure modes and motivating new solutions that generalize across datasets. To enhance stability, we also propose to maintain an exponential moving average of model weights. With these improvements, we can effortlessly scale score-based generative models to images with unprecedented resolutions ranging from 64x64 to 256x256. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets, including CelebA, FFHQ, and multiple LSUN categories.

Authors (2)

Yang Song (299 papers)
Stefano Ermon (279 papers)

Citations (993)

View on Semantic Scholar

Summary

Improved Techniques for Training Score-Based Generative Models

The paper by Yang Song and Stefano Ermon presents novel methodological advancements in the training of score-based generative models. These models offer an attractive alternative to GANs by mitigating the complications associated with adversarial optimization. Prior work on score-based models was constrained by limitations in handling high-resolution images and instability under certain settings. This paper introduces several contributions that address these constraints.

Theoretical Analysis and Noise Scale Determination

A detailed theoretical analysis of score-based generative models in high-dimensional spaces is presented. This analysis not only explicates existing failure modes but also motivates new solutions for noise scale selection. The choice of noise scales is crucial for the performance and stability of the model. Traditional settings, particularly for images exceeding $32 \times 32$ resolution, were inadequate. The authors provide a methodology to analytically compute effective noise scales that improve model performance significantly by balancing the perturbations effectively across different noise levels.

Efficient Amortization and Exponential Moving Average

The paper introduces an efficient architecture for amortizing score estimation across infinite noise scales using a single neural network. This is achieved by parameterizing the noise conditional score network (NCSN) with an unconditional score network, scaled appropriately for each noise level. This approach simplifies implementation and improves computational efficiency. Additionally, incorporating an exponential moving average (EMA) of model weights during training enhances stability and reduces sample artifacts, leading to more consistent model performance.

Configuration of Annealed Langevin Dynamics

For sample generation, the authors optimize the configurations of annealed Langevin dynamics—an iterative procedure for refining samples across multiple noise levels. This optimization ensures efficient convergence and high sample fidelity. The configuration parameters, such as the number of sampling steps per noise scale ( $T$ ) and the step size parameter ( $\epsilon$ ), are determined analytically, avoiding the need for exhaustive manual tuning.

Experimental Results and Implications

Empirical validations on datasets such as CelebA, FFHQ, and LSUN categories reveal that models trained with these improved techniques generate high-resolution images up to $256 \times 256$ , achieving FID scores that rival those of state-of-the-art GANs. These results underscore the practical viability of the proposed methods.

The implications of this research are twofold. Practically, it enables the use of score-based models for generating high-quality images at high resolutions, opening new avenues for applications in image synthesis, anomaly detection, and semi-supervised learning, among others. Theoretically, the work sets a strong precedent for further explorations into more sophisticated noise perturbations and alternative dynamic processes for sample generation. Future research could delve into optimizing hyperparameters further and applying these methods to other data modalities such as speech or behavior data.

Conclusion

The paper makes substantial contributions to the field of generative modeling by addressing key limitations of score-based models. The proposed techniques for noise scale selection, score network design, model stabilization through EMA, and optimized annealed Langevin dynamics collectively result in significant improvements. These advancements provide a robust foundation for extending score-based generative modeling to more complex and higher-dimensional datasets.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos