Score-based Generative Modeling in Latent Space (2106.05931v3)

Published 10 Jun 2021 in stat.ML and cs.LG

Abstract: Score-based generative models (SGMs) have recently demonstrated impressive results in terms of both sample quality and distribution coverage. However, they are usually applied directly in data space and often require thousands of network evaluations for sampling. Here, we propose the Latent Score-based Generative Model (LSGM), a novel approach that trains SGMs in a latent space, relying on the variational autoencoder framework. Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling. To enable training LSGMs end-to-end in a scalable and stable manner, we (i) introduce a new score-matching objective suitable to the LSGM setting, (ii) propose a novel parameterization of the score function that allows SGM to focus on the mismatch of the target distribution with respect to a simple Normal one, and (iii) analytically derive multiple techniques for variance reduction of the training objective. LSGM obtains a state-of-the-art FID score of 2.10 on CIFAR-10, outperforming all existing generative results on this dataset. On CelebA-HQ-256, LSGM is on a par with previous SGMs in sample quality while outperforming them in sampling time by two orders of magnitude. In modeling binary images, LSGM achieves state-of-the-art likelihood on the binarized OMNIGLOT dataset. Our project page and code can be found at https://nvlabs.github.io/LSGM .

Authors (3)

Arash Vahdat (69 papers)
Karsten Kreis (50 papers)
Jan Kautz (215 papers)

Citations (585)

View on Semantic Scholar

Summary

The paper presents the latent score-based generative model (LSGM), enhancing model expressivity and reducing network evaluations for faster sampling.
It introduces a tailored score-matching objective and a deviation-based parameterization that streamline training and improve overall performance.
By leveraging VAEs in latent space, LSGM effectively models diverse data types and sets new performance standards on CIFAR-10 and CelebA-HQ.

Score-based Generative Modeling in Latent Space: An Overview

The paper "Score-based Generative Modeling in Latent Space" presents a significant advancement in generative modeling by introducing the Latent Score-based Generative Model (LSGM). This method leverages Score-based Generative Models (SGMs) within a latent space framework, utilizing Variational Autoencoders (VAEs) to address computational inefficiencies typical in SGMs.

Key Innovations

The authors of this paper propose several innovations to optimize the performance of SGMs:

Latent Space Training: By embedding SGMs within a latent space, LSGM enhances expressivity, accommodates non-continuous data, and achieves smoother transitions, significantly reducing the number of network evaluations required for sampling.
Score-Matching Objective: A new score-matching objective is designed specifically for the LSGM setting, ensuring stability and scalability in training. This objective is derived to facilitate simultaneous learning of both the VAE and the latent SGM components.
Novel Parameterization: The authors introduce a parameterization of the score function that focuses on modeling the deviation from a simple Normal distribution, which simplifies training and improves performance.
Variance Reduction Techniques: Analytical methods for reducing variance in the training objective are provided, further enhancing efficiency and stability.

Strong Numerical Results

The paper reports impressive performance metrics:

LSGM achieves a state-of-the-art Fréchet Inception Distance (FID) of 2.10 on CIFAR-10, outperforming existing generative models on this dataset.
On CelebA-HQ-256, the sample quality is on par with previous SGMs while achieving faster sampling times by two orders of magnitude.
For binary datasets, such as OMNIGLOT, LSGM sets a new standard for likelihood estimation.

Methodological Implications

The introduction of a latent space approach in SGMs opens new avenues for increased model expressivity and flexibility. This method mitigates the substantial computational load associated with traditional SGMs by reducing the complexity of the forward and reverse processes necessary for data synthesis.

Additionally, the LSGM framework enables effective modeling of diverse data types, including binary and categorical data, which are notoriously difficult to handle in conventional SGMs.

Practical Applications

The enhanced efficiency in sampling and the ability to model various data types suggest numerous applications for LSGM, particularly in areas requiring high-quality synthetic data or complex, multi-modal distributions. For instance, LSGM can be employed in artistic content generation, data augmentation, or any domain where rapid generation of diverse samples is advantageous.

Theoretical Implications and Future Directions

From a theoretical perspective, the work highlights the potential of integrating latent variable models such as VAEs with score-based techniques to achieve improved generative performance. The research suggests that other generative frameworks might benefit similarly from incorporating latent space methodologies.

The research invites further exploration into optimizing the balance between synthesis speed and sample quality. Additionally, there is scope for extending LSGMs to more complex data structures, potentially enhancing their applicability in domains like natural language processing or 3D structure generation.

Conclusion

The paper "Score-based Generative Modeling in Latent Space" provides a comprehensive refinement to SGM frameworks, demonstrating both theoretical and practical advancements. By situating SGMs within a latent space context, the authors succeed in addressing significant limitations of traditional models, paving the way for more efficient and adaptable generative processes in complex domains.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/ArashVahdat/status/1912667569724760258

https://twitter.com/sedielem/status/1839990208210583749

https://twitter.com/ArashVahdat/status/1912667931496075316

https://twitter.com/isomorphicneko/status/1827255215894020589

YouTube

Show All Videos