Structured Denoising Diffusion Models in Discrete State-Spaces (2107.03006v3)

Published 7 Jul 2021 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Denoising diffusion probabilistic models (DDPMs) (Ho et al. 2020) have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models for discrete data that generalize the multinomial diffusion model of Hoogeboom et al. 2021, by going beyond corruption processes with uniform transition probabilities. This includes corruption with transition matrices that mimic Gaussian kernels in continuous space, matrices based on nearest neighbors in embedding space, and matrices that introduce absorbing states. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition matrix is an important design decision that leads to improved results in image and text domains. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation while scaling to large vocabularies on LM1B. On the image dataset CIFAR-10, our models approach the sample quality and exceed the log-likelihood of the continuous-space DDPM model.

Authors (5)

Jacob Austin (15 papers)
Daniel D. Johnson (11 papers)
Jonathan Ho (27 papers)
Daniel Tarlow (41 papers)
Rianne van den Berg (22 papers)

Citations (689)

View on Semantic Scholar

Summary

Structured Denoising Diffusion Models in Discrete State-Spaces

The paper introduces Discrete Denoising Diffusion Probabilistic Models (D3PMs), extending the capabilities of existing continuous-state diffusion models to discrete data. These models offer a notable advance for generating discrete data, specifically in image and text synthesis. The paper presents a generalized framework that adapts the diffusion process, traditionally modeled in continuous spaces, for discrete data types. This includes a departure from uniform transition probabilities, incorporating more structured corruption processes through transition matrices that simulate Gaussian kernels or employ nearest-neighbor methods.

Methodology

The authors leverage a structured approach to the forward diffusion process, using transition matrices that interact with discrete variables across multiple categories. By allowing for non-uniform transitions, the framework supports a variety of corruption processes, such as:

Uniform Transitions: A baseline where each state transitions uniformly to others, extending previous models.
Absorbing State Transitions: Providing a mechanism similar to masked LLMs where states can transition to a specific "mask" token.
Discretized Gaussian Transitions: Promoting transitions between ordinally proximal states, effectively mirroring continuous Gaussian diffusion in discrete domains.
Nearest Neighbor Transitions: Using semantic relationships within data for informed transitions, exemplified through token embedding spaces in textual data.

This structured approach underpins improved denoising performance by aligning transition matrices with inherent data characteristics.

Empirical Results

Empirical evaluations demonstrate robust performance across text and image datasets:

Text Generation: On the text8 dataset, models achieved competitive negative log-likelihoods, rivalling established non-autoregressive methods. The D3PM absorbing model, in particular, reflected superior efficacy.
Image Generation: Significant advancements on the CIFAR-10 dataset were reported, where models employing discretized Gaussian transitions aligned closely with existing continuous diffusion models regarding log-likelihoods, delivering high-quality samples.

Theoretical Implications and Connections

The framework builds on existing probabilistic models, connecting diffusion processes with autoregressive models and BERT-like masked models. Autoregressive models are conceptualized as discrete diffusion processes where masking and generation occur incrementally across a sequence. The paper underscores the flexibility of D3PMs through such intersections, discussing the nuanced relationships between these techniques.

Future Directions

The paper indicates potential trajectories for future exploration, such as optimizing noise schedules and exploring diverse loss functions to further align model training with the intricate structures within discrete data. The adaptability of D3PMs heralds a promising avenue for refining generative models across varied discrete data applications, potentially enhancing fields like natural language processing and categorical image generation.

In summary, the authors present a thorough investigation into the adaptation of diffusion models for discrete data, offering substantial improvements over prior approaches. The introduction of structured transitions paves the way for future exploration in both theoretical and practical domains, promising to extend the utility of diffusion-based generative models.