Generalized Denoising Diffusion Codebook Model

Updated 21 March 2026

gDDCM is a unified framework that extends DDCM by employing adjustable noise injection and codebook quantization across diverse diffusion models.
It integrates deterministic ODE and stochastic SDE approaches to tokenization, enabling near-lossless image reconstruction with improved performance on datasets like CIFAR-10 and LSUN Bedroom.
Experimental results show that gDDCM enhances fidelity and compression efficiency, achieving superior FID, LPIPS, IS, and SSIM metrics compared to standard DDCM.

The Generalized Denoising Diffusion Codebook Model (gDDCM) is an extension of the Denoising Diffusion Codebook Model (DDCM), designed to enable discrete tokenization and compression of images under a broad class of diffusion-type generative models. gDDCM replaces the injection of novel Gaussian noise in the backward process of pre-trained diffusion models with a codebook-based quantization scheme, thereby emitting a compact, lossless or near-lossless bitstream representing the generated or reconstructed sample. The generalization introduced by gDDCM covers Denoising Diffusion Probabilistic Models (DDPM), continuous score-based models, consistency models, and rectified flow/flow-matching methods. This model provides both a unifying framework and practical algorithms for image tokenization and compression, which can operate in either stochastic or deterministic (ODE-based) diffusion settings (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).

1. Conceptual Foundation and Main Contributions

gDDCM generalizes the discrete tokenization mechanism of DDCM beyond DDPM to all principal variants of diffusion models, including score-based SDEs, deterministic ODE-score models (e.g., consistency, rectified flow), and their hybrids. The central innovation is a unified forward and backward process parameterized by $p \in [0,1]$ , which recovers classic DDCM as a special case ( $p=\frac{1}{2}$ in DDPM), and allows flexible tuning of noise injection. The framework supports:

Extraction of a finite-length sequence of codebook indices (tokens) $\{\ell_k\}$ representing an input image.
Reconstruction of high-fidelity approximations $\hat x_0$ using only these discrete tokens.
Deployment in both discrete-schedule (DDPM-like) and continuous time (SDE/ODE, consistency, rectified-flow) generative settings.
Recovery and improvement of DDCM as a limiting case and demonstration of improved sample quality and compression performance compared to the original (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).

2. Mathematical Formulation and Unified Marginals

gDDCM leverages the observation that all mainstream diffusion-type models admit a marginal distribution of the form:

$x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$

with $s(t)$ , $\sigma(t)$ chosen appropriately for each underlying diffusion process. The "backward" or tokenization step, to transition from time $t$ to $t-p\Delta t$ , proceeds as:

Deterministic ODE-style update: Use the pretrained model to compute predicted clean image $\hat x_0$ and, where applicable, predicted noise $p=\frac{1}{2}$ 0, to advance deterministically.
Codebook quantization: The stochastic increment required is quantized to the nearest codebook vector from a fixed collection $p=\frac{1}{2}$ 1 (drawn once from $p=\frac{1}{2}$ 2 or seeded for stateless recovery), indexed by $p=\frac{1}{2}$ 3.

The one-step update with this quantization has the $p=\frac{1}{2}$ 4-accurate form:

$p=\frac{1}{2}$ 5

where $p=\frac{1}{2}$ 6 is the best-matching codebook vector to the required perturbation. The discrete codebook quantization satisfies:

$p=\frac{1}{2}$ 7

For $p=\frac{1}{2}$ 8, the process becomes fully deterministic (ODE inversion), and the quantizer reduces to DDIM inversion with codebook selection per step.

3. Algorithms and Procedural Steps

gDDCM provides algorithmic prescriptions for both continuous and discrete-time variants:

(Alg. 1, $p=\frac{1}{2}$ 9): Step through the time axis, at each iteration:
- Compute $\{\ell_k\}$ 0, obtain model outputs $\{\ell_k\}$ 1;
- Update $\{\ell_k\}$ 2 using the ODE term and quantize the noise increment to the closest $\{\ell_k\}$ 3 in $\{\ell_k\}$ 4;
- Store codebook index $\{\ell_k\}$ 5, decrement time, repeat for $\{\ell_k\}$ 6 total steps.
- Optionally, perform a final reverse ODE or DDIM step for improved reconstruction fidelity.
(Alg. 2, $\{\ell_k\}$ 7): Use explicit DDIM inversion with codebook quantization at each discrete step.

The decoder replays the steps in reverse, using the same codebooks and indices to reconstruct the approximate original $\{\ell_k\}$ 8 with fidelity determined by codebook size $\{\ell_k\}$ 9 and token length $\hat x_0$ 0.

4. Application to Diffusion Model Variants

gDDCM directly recovers and extends DDCM and applies to a wide array of diffusion frameworks:

Model Variant	Forward/Backward Rule	Notes
DDPM (discrete/DDIM)	Eq. 11, stepwise, $\hat x_0$ 1	DDCM as $\hat x_0$ 2
Score-based/SDE	Continuous, use Eq. (21)	with predicted score
Consistency Model	As SDE	deterministic map
Rectified-Flow/ODE	Euler step via Thm 1, Eq. (21)	ODE, $\hat x_0$ 3

gDDCM thus enables codebook-based compression and generation in fully deterministic settings (ODE-based) and stochastic SDE settings, as well as hybrid models, by proper parameterization of the update step and quantization process (Kong, 17 Nov 2025).

5. Empirical Performance and Results

Extensive experiments on CIFAR-10 and LSUN Bedroom datasets demonstrate that gDDCM with $\hat x_0$ 4 (ODE-style, fully deterministic) consistently achieves superior or comparable generative fidelity and compression metrics relative to DDCM at $\hat x_0$ 5 (random or partially stochastic backward noise injection):

CIFAR-10 (N=300 tokens)

Model, $\hat x_0$ 6	FID $\hat x_0$ 7	LPIPS $\hat x_0$ 8	IS $\hat x_0$ 9	SSIM $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 0
DDPM, $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 1	7.7	0.138	9.67	0.93
DDPM, $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 2	3.2	0.060	10.5	0.98
EDM, $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 3	4.5	0.099	10.3	0.95
EDM, $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 4	4.3	0.078	10.9	0.96
CM, $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 5	× (fails)	—	—	—
CM, $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 6	4.3	0.049	10.1	0.98
ReFlow, $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 7	(best)	0.049	10.1	0.98

On LSUN Bedroom ( $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 8), only LPIPS and SSIM were reported; gDDCM ( $x_t = s(t) \cdot x_0 + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$ 9) consistently surpassed DDCM ( $s(t)$ 0) with LPIPS $s(t)$ 1 0.03 and SSIM $s(t)$ 2 0.99.

Qualitatively, reconstructions from token streams are nearly indistinguishable from originals, and intermediate states ( $s(t)$ 3) after minimal noise injection preserve most image content. For all model classes and datasets, $s(t)$ 4 tokens suffice for near-lossless fidelity (Kong, 17 Nov 2025).

6. Codebook Construction, Training, and Hyperparameters

Codebooks are comprised of $s(t)$ 5 vectors, drawn once from $s(t)$ 6, with indices regarded as the compressed "token sequence." For memory efficiency, codebooks can be recovered from a fixed seed.
Training: gDDCM dispenses with novel loss functions; it leverages existing pretrained diffusion/consistency/flow models trained under standard objectives (denoising score-matching, etc.). The tokenization and reconstruction process is entirely algorithmic/inference-time.
Scheduler and $s(t)$ 7 tuning: Optimal performance requires grid search over the step schedule $s(t)$ 8 and noise control parameter $s(t)$ 9 on a held-out set.
Variants: Each time $\sigma(t)$ 0 (or step) uses its own codebook. This is currently not amortized, though pseudo-random codebook generation curtails memory burden. Extensions to adaptive/learned codebooks or vector quantization remain unexplored.

7. Limitations, Open Problems, and Extension Directions

Known limitations of gDDCM include:

The need for manual or grid-searched tuning of the schedule $\sigma(t)$ 1 and parameter $\sigma(t)$ 2.
For $\sigma(t)$ 3, backward noise injection and the reverse sampling process are coupled; if the reverse sampler is not optimal (e.g., due to large discretization error), compression quality may deteriorate.
Codebook-per-time-step is required, but memory overhead is mitigated by procedural generation; increasing codebook efficiency via learning remains a promising direction.
For large $\sigma(t)$ 4 (high noise), the conditional deviates substantially from the marginal distribution, suggesting that tokenization should avoid overly noisy starting points.
Possible extensions include end-to-end joint finetuning (model and codebooks), adaptive codebook sizes, and application to multimodal data such as video.

A plausible implication is that improved codebook construction and integration with adaptive or learned quantization strategies could further enhance compression efficiency and generalization to non-image domains (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model (2025)

Compressed Image Generation with Denoising Diffusion Codebook Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generalized Denoising Diffusion Codebook Model (gDDCM).