Generalized Denoising Diffusion Codebook Model
- gDDCM is a unified framework that extends DDCM by employing adjustable noise injection and codebook quantization across diverse diffusion models.
- It integrates deterministic ODE and stochastic SDE approaches to tokenization, enabling near-lossless image reconstruction with improved performance on datasets like CIFAR-10 and LSUN Bedroom.
- Experimental results show that gDDCM enhances fidelity and compression efficiency, achieving superior FID, LPIPS, IS, and SSIM metrics compared to standard DDCM.
The Generalized Denoising Diffusion Codebook Model (gDDCM) is an extension of the Denoising Diffusion Codebook Model (DDCM), designed to enable discrete tokenization and compression of images under a broad class of diffusion-type generative models. gDDCM replaces the injection of novel Gaussian noise in the backward process of pre-trained diffusion models with a codebook-based quantization scheme, thereby emitting a compact, lossless or near-lossless bitstream representing the generated or reconstructed sample. The generalization introduced by gDDCM covers Denoising Diffusion Probabilistic Models (DDPM), continuous score-based models, consistency models, and rectified flow/flow-matching methods. This model provides both a unifying framework and practical algorithms for image tokenization and compression, which can operate in either stochastic or deterministic (ODE-based) diffusion settings (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).
1. Conceptual Foundation and Main Contributions
gDDCM generalizes the discrete tokenization mechanism of DDCM beyond DDPM to all principal variants of diffusion models, including score-based SDEs, deterministic ODE-score models (e.g., consistency, rectified flow), and their hybrids. The central innovation is a unified forward and backward process parameterized by , which recovers classic DDCM as a special case ( in DDPM), and allows flexible tuning of noise injection. The framework supports:
- Extraction of a finite-length sequence of codebook indices (tokens) representing an input image.
- Reconstruction of high-fidelity approximations using only these discrete tokens.
- Deployment in both discrete-schedule (DDPM-like) and continuous time (SDE/ODE, consistency, rectified-flow) generative settings.
- Recovery and improvement of DDCM as a limiting case and demonstration of improved sample quality and compression performance compared to the original (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).
2. Mathematical Formulation and Unified Marginals
gDDCM leverages the observation that all mainstream diffusion-type models admit a marginal distribution of the form:
with , chosen appropriately for each underlying diffusion process. The "backward" or tokenization step, to transition from time to , proceeds as:
- Deterministic ODE-style update: Use the pretrained model to compute predicted clean image and, where applicable, predicted noise , to advance deterministically.
- Codebook quantization: The stochastic increment required is quantized to the nearest codebook vector from a fixed collection (drawn once from or seeded for stateless recovery), indexed by .
The one-step update with this quantization has the -accurate form:
where is the best-matching codebook vector to the required perturbation. The discrete codebook quantization satisfies:
For , the process becomes fully deterministic (ODE inversion), and the quantizer reduces to DDIM inversion with codebook selection per step.
3. Algorithms and Procedural Steps
gDDCM provides algorithmic prescriptions for both continuous and discrete-time variants:
- (Alg. 1, ): Step through the time axis, at each iteration:
- (Alg. 2, ): Use explicit DDIM inversion with codebook quantization at each discrete step.
The decoder replays the steps in reverse, using the same codebooks and indices to reconstruct the approximate original with fidelity determined by codebook size and token length .
4. Application to Diffusion Model Variants
gDDCM directly recovers and extends DDCM and applies to a wide array of diffusion frameworks:
| Model Variant | Forward/Backward Rule | Notes |
|---|---|---|
| DDPM (discrete/DDIM) | Eq. 11, stepwise, | DDCM as |
| Score-based/SDE | Continuous, use Eq. (21) | with predicted score |
| Consistency Model | As SDE | deterministic map |
| Rectified-Flow/ODE | Euler step via Thm 1, Eq. (21) | ODE, |
gDDCM thus enables codebook-based compression and generation in fully deterministic settings (ODE-based) and stochastic SDE settings, as well as hybrid models, by proper parameterization of the update step and quantization process (Kong, 17 Nov 2025).
5. Empirical Performance and Results
Extensive experiments on CIFAR-10 and LSUN Bedroom datasets demonstrate that gDDCM with (ODE-style, fully deterministic) consistently achieves superior or comparable generative fidelity and compression metrics relative to DDCM at (random or partially stochastic backward noise injection):
CIFAR-10 (N=300 tokens)
| Model, | FID | LPIPS | IS | SSIM |
|---|---|---|---|---|
| DDPM, | 7.7 | 0.138 | 9.67 | 0.93 |
| DDPM, | 3.2 | 0.060 | 10.5 | 0.98 |
| EDM, | 4.5 | 0.099 | 10.3 | 0.95 |
| EDM, | 4.3 | 0.078 | 10.9 | 0.96 |
| CM, | × (fails) | — | — | — |
| CM, | 4.3 | 0.049 | 10.1 | 0.98 |
| ReFlow, | (best) | 0.049 | 10.1 | 0.98 |
On LSUN Bedroom (), only LPIPS and SSIM were reported; gDDCM () consistently surpassed DDCM () with LPIPS 0.03 and SSIM 0.99.
Qualitatively, reconstructions from token streams are nearly indistinguishable from originals, and intermediate states () after minimal noise injection preserve most image content. For all model classes and datasets, tokens suffice for near-lossless fidelity (Kong, 17 Nov 2025).
6. Codebook Construction, Training, and Hyperparameters
- Codebooks are comprised of vectors, drawn once from , with indices regarded as the compressed "token sequence." For memory efficiency, codebooks can be recovered from a fixed seed.
- Training: gDDCM dispenses with novel loss functions; it leverages existing pretrained diffusion/consistency/flow models trained under standard objectives (denoising score-matching, etc.). The tokenization and reconstruction process is entirely algorithmic/inference-time.
- Scheduler and tuning: Optimal performance requires grid search over the step schedule and noise control parameter on a held-out set.
- Variants: Each time (or step) uses its own codebook. This is currently not amortized, though pseudo-random codebook generation curtails memory burden. Extensions to adaptive/learned codebooks or vector quantization remain unexplored.
7. Limitations, Open Problems, and Extension Directions
Known limitations of gDDCM include:
- The need for manual or grid-searched tuning of the schedule and parameter .
- For , backward noise injection and the reverse sampling process are coupled; if the reverse sampler is not optimal (e.g., due to large discretization error), compression quality may deteriorate.
- Codebook-per-time-step is required, but memory overhead is mitigated by procedural generation; increasing codebook efficiency via learning remains a promising direction.
- For large (high noise), the conditional deviates substantially from the marginal distribution, suggesting that tokenization should avoid overly noisy starting points.
- Possible extensions include end-to-end joint finetuning (model and codebooks), adaptive codebook sizes, and application to multimodal data such as video.
A plausible implication is that improved codebook construction and integration with adaptive or learned quantization strategies could further enhance compression efficiency and generalization to non-image domains (Kong, 17 Nov 2025, Ohayon et al., 3 Feb 2025).