Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generalized Denoising Diffusion Codebook Model

Updated 21 March 2026
  • gDDCM is a unified framework that extends DDCM by employing adjustable noise injection and codebook quantization across diverse diffusion models.
  • It integrates deterministic ODE and stochastic SDE approaches to tokenization, enabling near-lossless image reconstruction with improved performance on datasets like CIFAR-10 and LSUN Bedroom.
  • Experimental results show that gDDCM enhances fidelity and compression efficiency, achieving superior FID, LPIPS, IS, and SSIM metrics compared to standard DDCM.

The Generalized Denoising Diffusion Codebook Model (gDDCM) is an extension of the Denoising Diffusion Codebook Model (DDCM), designed to enable discrete tokenization and compression of images under a broad class of diffusion-type generative models. gDDCM replaces the injection of novel Gaussian noise in the backward process of pre-trained diffusion models with a codebook-based quantization scheme, thereby emitting a compact, lossless or near-lossless bitstream representing the generated or reconstructed sample. The generalization introduced by gDDCM covers Denoising Diffusion Probabilistic Models (DDPM), continuous score-based models, consistency models, and rectified flow/flow-matching methods. This model provides both a unifying framework and practical algorithms for image tokenization and compression, which can operate in either stochastic or deterministic (ODE-based) diffusion settings (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).

1. Conceptual Foundation and Main Contributions

gDDCM generalizes the discrete tokenization mechanism of DDCM beyond DDPM to all principal variants of diffusion models, including score-based SDEs, deterministic ODE-score models (e.g., consistency, rectified flow), and their hybrids. The central innovation is a unified forward and backward process parameterized by p[0,1]p \in [0,1], which recovers classic DDCM as a special case (p=12p=\frac{1}{2} in DDPM), and allows flexible tuning of noise injection. The framework supports:

  • Extraction of a finite-length sequence of codebook indices (tokens) {k}\{\ell_k\} representing an input image.
  • Reconstruction of high-fidelity approximations x^0\hat x_0 using only these discrete tokens.
  • Deployment in both discrete-schedule (DDPM-like) and continuous time (SDE/ODE, consistency, rectified-flow) generative settings.
  • Recovery and improvement of DDCM as a limiting case and demonstration of improved sample quality and compression performance compared to the original (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).

2. Mathematical Formulation and Unified Marginals

gDDCM leverages the observation that all mainstream diffusion-type models admit a marginal distribution of the form:

xt=s(t)x0+σ(t)ϵ,ϵN(0,I)x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)

with s(t)s(t), σ(t)\sigma(t) chosen appropriately for each underlying diffusion process. The "backward" or tokenization step, to transition from time tt to tpΔtt-p\Delta t, proceeds as:

  1. Deterministic ODE-style update: Use the pretrained model to compute predicted clean image x^0\hat x_0 and, where applicable, predicted noise ϵ\epsilon', to advance deterministically.
  2. Codebook quantization: The stochastic increment required is quantized to the nearest codebook vector from a fixed collection E={E1,...,EK}\mathcal{E} = \{E_1, ..., E_K\} (drawn once from N(0,I)\mathcal{N}(0,I) or seeded for stateless recovery), indexed by k\ell_k.

The one-step update with this quantization has the O(Δt2)O(\Delta t^2)-accurate form:

xtpΔt=s(tpΔt)x^0+σ(tpΔt)ϵ+σ(t)2σ(tpΔt)2Ecx_{t-p\Delta t} = s(t-p\Delta t)\, \hat x_0 + \sigma(t-p\Delta t)\, \epsilon' + \sqrt{\sigma(t)^2 - \sigma(t-p\Delta t)^2} \cdot E_c

where EcE_c is the best-matching codebook vector to the required perturbation. The discrete codebook quantization satisfies:

Ec=argminEEEe2argmaxEEx^0x0,EE_c = \arg\min_{E \in \mathcal{E}} \|E - e'\|^2 \approx \arg\max_{E \in \mathcal{E}} \langle \hat x_0 - x_0, E \rangle

For p=0p=0, the process becomes fully deterministic (ODE inversion), and the quantizer reduces to DDIM inversion with codebook selection per step.

3. Algorithms and Procedural Steps

gDDCM provides algorithmic prescriptions for both continuous and discrete-time variants:

  • (Alg. 1, p0p \neq 0): Step through the time axis, at each iteration:
    • Compute Δt\Delta t, obtain model outputs x^0,ϵ\hat x_0, \epsilon';
    • Update xtpΔtx_{t-p\Delta t} using the ODE term and quantize the noise increment to the closest EcE_c in E\mathcal{E};
    • Store codebook index k\ell_k, decrement time, repeat for NN total steps.
    • Optionally, perform a final reverse ODE or DDIM step for improved reconstruction fidelity.
  • (Alg. 2, p=0p=0): Use explicit DDIM inversion with codebook quantization at each discrete step.

The decoder replays the steps in reverse, using the same codebooks and indices to reconstruct the approximate original x0x_0 with fidelity determined by codebook size KK and token length NN.

4. Application to Diffusion Model Variants

gDDCM directly recovers and extends DDCM and applies to a wide array of diffusion frameworks:

Model Variant Forward/Backward Rule Notes
DDPM (discrete/DDIM) Eq. 11, stepwise, p=0.5,0p=0.5,0 DDCM as p=0.5p=0.5
Score-based/SDE Continuous, use Eq. (21) with predicted score
Consistency Model As SDE deterministic map
Rectified-Flow/ODE Euler step via Thm 1, Eq. (21) ODE, O(Δt2)O(\Delta t^2)

gDDCM thus enables codebook-based compression and generation in fully deterministic settings (ODE-based) and stochastic SDE settings, as well as hybrid models, by proper parameterization of the update step and quantization process (Kong, 17 Nov 2025).

5. Empirical Performance and Results

Extensive experiments on CIFAR-10 and LSUN Bedroom datasets demonstrate that gDDCM with p=0p=0 (ODE-style, fully deterministic) consistently achieves superior or comparable generative fidelity and compression metrics relative to DDCM at p=0.5p=0.5 (random or partially stochastic backward noise injection):

CIFAR-10 (N=300 tokens)

Model, pp FID \downarrow LPIPS \downarrow IS \uparrow SSIM \uparrow
DDPM, p=0.5p=0.5 7.7 0.138 9.67 0.93
DDPM, p=0p=0 3.2 0.060 10.5 0.98
EDM, p=0.5p=0.5 4.5 0.099 10.3 0.95
EDM, p=0p=0 4.3 0.078 10.9 0.96
CM, p=0.5p=0.5 × (fails)
CM, p=0p=0 4.3 0.049 10.1 0.98
ReFlow, p=0p=0 (best) 0.049 10.1 0.98

On LSUN Bedroom (256×256256\times256), only LPIPS and SSIM were reported; gDDCM (p=0p=0) consistently surpassed DDCM (p=12p=\frac{1}{2}) with LPIPS \approx 0.03 and SSIM \approx 0.99.

Qualitatively, reconstructions from token streams are nearly indistinguishable from originals, and intermediate states (xtx_t) after minimal noise injection preserve most image content. For all model classes and datasets, N300N\approx 300 tokens suffice for near-lossless fidelity (Kong, 17 Nov 2025).

6. Codebook Construction, Training, and Hyperparameters

  • Codebooks are comprised of KK vectors, drawn once from N(0,I)\mathcal{N}(0, I), with indices regarded as the compressed "token sequence." For memory efficiency, codebooks can be recovered from a fixed seed.
  • Training: gDDCM dispenses with novel loss functions; it leverages existing pretrained diffusion/consistency/flow models trained under standard objectives (denoising score-matching, etc.). The tokenization and reconstruction process is entirely algorithmic/inference-time.
  • Scheduler and pp tuning: Optimal performance requires grid search over the step schedule Δt(k)\Delta t(k) and noise control parameter pp on a held-out set.
  • Variants: Each time tt (or step) uses its own codebook. This is currently not amortized, though pseudo-random codebook generation curtails memory burden. Extensions to adaptive/learned codebooks or vector quantization remain unexplored.

7. Limitations, Open Problems, and Extension Directions

Known limitations of gDDCM include:

  • The need for manual or grid-searched tuning of the schedule Δt(k)\Delta t(k) and parameter pp.
  • For p0p \neq 0, backward noise injection and the reverse sampling process are coupled; if the reverse sampler is not optimal (e.g., due to large discretization error), compression quality may deteriorate.
  • Codebook-per-time-step is required, but memory overhead is mitigated by procedural generation; increasing codebook efficiency via learning remains a promising direction.
  • For large tt (high noise), the conditional deviates substantially from the marginal distribution, suggesting that tokenization should avoid overly noisy starting points.
  • Possible extensions include end-to-end joint finetuning (model and codebooks), adaptive codebook sizes, and application to multimodal data such as video.

A plausible implication is that improved codebook construction and integration with adaptive or learned quantization strategies could further enhance compression efficiency and generalization to non-image domains (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Denoising Diffusion Codebook Model (gDDCM).