Denoising Diffusion Codebook Models

Updated 21 March 2026

DDCMs are generative and compression frameworks that discretize the stochastic noise in diffusion models using fixed Gaussian codebooks.
They leverage deterministic codeword selection to optimize residual reduction for high-fidelity image reconstruction and conditional tasks.
Turbo-DDCM and gDDCM variants enhance efficiency and applicability, achieving state-of-the-art trade-offs in perceptual image compression.

Denoising Diffusion Codebook Models (DDCMs) are a class of generative and compression frameworks that discretize the stochasticity of denoising diffusion models using finite, reproducible codebooks of Gaussian noise vectors. DDCMs simultaneously enable lossless mapping between bitstreams and high-quality images, with applications in perceptual image compression, compressed sample generation, and conditional inverse problems. Recent advances include highly efficient variants (Turbo-DDCM), generalizations across diffusion families (gDDCM), and integration with conditional and region-prioritized sampling.

1. The DDCM Framework: Discretizing Diffusion Sampling

Denoising Diffusion Probabilistic Models (DDPMs) establish a Markov chain $x_0 \to x_1 \to \dots \to x_T$ where each step injects additive Gaussian noise:

$q(x_t\mid x_{t-1}) = \mathcal N \left( x_t ; \sqrt{\alpha_t}x_{t-1}, (1-\alpha_t)\mathbf I \right)$

The reverse process is learned via a neural denoiser or score model, with the canonical form: $x_{t-1} = \mu_\theta(x_t, t) + \sigma_t z_t, \quad z_t\sim\mathcal N(0,I)$ DDCMs replace the standard Gaussian samples $z_t$ with elements from a finite codebook $C_t = [c_t^{(1)}, \dots, c_t^{(K)}]$ , $c_t^{(k)}\sim\mathcal N(0,I)$ , chosen deterministically according to the current reverse path and potentially a target reconstruction $x_0$ (Ohayon et al., 3 Feb 2025, Vaisman et al., 9 Nov 2025). The update thus becomes: $x_{t-1} = \mu_\theta(x_t, t) + \sigma_t c_t^{(k_t)}, \quad k_t \in \{1,\dots, K\}$ The full trajectory is losslessly specified by the index sequence $\{k_2,\dots,k_T\}$ , yielding a constant-bits-per-pixel (BPP) discrete encoding: $\mathrm{BPP}_{\mathrm{DDCM}} = \frac{(T-1) \lceil \log_2 K \rceil}{\#\mathrm{pixels}}$ The encoder and decoder share the same codebooks and neural denoiser, ensuring reproducibility from the compressed stream.

2. Noise Selection, Conditional Objectives, and Algorithmic Details

Unlike random codeword selection, DDCM achieves compression or conditional generation by greedily aligning codebook entries to desired reconstruction objectives. For image compression, the encoder chooses $k_t = \arg\max_{k} \langle c_t^{(k)}, r_t \rangle$ at each step, where $r_t = x_0 - \hat{x}_{0|t}(x_t)$ , and $\hat{x}_{0|t}$ is the current MMSE denoiser prediction (Ohayon et al., 3 Feb 2025). This pushes the reverse process to reconstruct $x_0$ with minimal residual error, subject to the codebook discretization. For conditional tasks (e.g., restoration with condition $y$ ), the objective generalizes: $k_t = \arg\min_{k} \mathcal{L}(y, x_t, c_t^{(k)})$ where $\mathcal{L}$ may encode posterior likelihood, classifier-free guidance, or arbitrary perceptual metrics.

The DDCM sampling and compression scheme does not require modification or retraining of the underlying diffusion backbone—the selection operates entirely at inference time using upstream network predictions (Ohayon et al., 3 Feb 2025, Vaisman et al., 9 Nov 2025). A plausible implication is that DDCM is model-agnostic, provided access to noise prediction or score-based outputs.

3. Turbo-DDCM: Runtime Optimization and Multi-Atom Selection

DDCM with single-codeword selection is computationally intensive due to the large number of denoising steps (typically 500–1000). Turbo-DDCM accelerates encoding and decoding by allowing the selection of multiple codewords (atoms) per denoising step, transforming the codeword combination problem into a constrained least-squares objective (Vaisman et al., 9 Nov 2025): $\min_{w \in \mathbb{R}^K} \| C_t w - r_t \|_2^2 \quad \text{s.t. } \|w\|_0 = M, \; w_i \in V \cup \{0\}, \; V=\{ \pm 1 \}$ A hard-thresholding solution computes the $M$ largest absolute codebook correlations and forms a normalized sum:

Select atom indices $S$ corresponding to top $M$ entries.
Set signs $w_S = \mathrm{sign}(\alpha_S)$ .
Output $z_t = \mathrm{normalize}(C_t w)$ .

This approach maintains residual reduction comparable to matching pursuit but at only $\Theta(K d)$ operations per step, independent of $M$ , and drastically reduces the required number of denoising steps (e.g., $T$ drops from 1000 to $\approx 20$ ), yielding observed runtime improvements of over $40\times$ relative to DDCM (Vaisman et al., 9 Nov 2025). Bitstream representation encodes the lexicographic index of the unordered $M$ -subset, plus coefficient quantization, efficiently packing information with minimal overhead.

4. Extensions: gDDCM, Conditional, and Priority-Aware Variants

Generalized DDCM (gDDCM) extends the paradigm to alternative diffusion forms, including score-based SDEs, consistency models, and flow-matching (rectified flow), exploiting the shared marginal structure $x_t = s(t)x_0 + \Sigma(t)\epsilon$ (Kong, 17 Nov 2025). Noise tokenization uses nearest-neighbor codebook assignment in noise space, with variants adjusting the stage at which codebook injection occurs (pure vs. two-stage).

Specialized Turbo-DDCM variants further enhance flexibility (Vaisman et al., 9 Nov 2025):

Priority-aware Turbo-DDCM modifies the residual by spatial weights, enhancing ROI quality without increasing overall BPP.
Distortion-controlled Turbo-DDCM predicts PSNR from JPEG size proxies to select BPP, reducing PSNR variance by >40%.

DDCM selection objectives extend naturally to conditional and guided scenarios, yielding compressed posterior samples for image restoration, classifier guidance, and perception-optimized reconstructions—all while emitting lossless discrete code streams (Ohayon et al., 3 Feb 2025).

5. Computational Complexity, Scaling, and Practical Advantages

Classic DDCM with matching pursuit refinement exhibits $\Theta(M 2^C K d)$ per-step complexity and scaling with $T=500\text{--}1000$ neural queries. Turbo-DDCM reduces the sampling to $T\approx 20$ with $\Theta(K d)$ per step. For typical image compression benchmarks, this yields per-image runtime of 1.5 seconds (A40 GPU), compared to 65 seconds for DDCM and 4 seconds for DiffC (Vaisman et al., 9 Nov 2025). The Turbo approach achieves near-constant runtime across a wide range of BPPs. No custom kernels or hardware-specific acceleration is required.

6. Empirical Results and Applications

Across standard test sets (Kodak24, DIV2K, CIFAR-10, LSUN Bedroom), DDCM and Turbo-DDCM deliver competitive or state-of-the-art Pareto rate–distortion–perception trade-offs (Ohayon et al., 3 Feb 2025, Vaisman et al., 9 Nov 2025, Kong, 17 Nov 2025). Representative findings:

At $\mathrm{BPP} \approx 0.05$ , Turbo-DDCM achieves PSNR ≈ 24 dB, LPIPS ≈ 0.12, FID ≈ 9.5 within 1.5 seconds/image.
DDCM-encoded samples match the diversity/FID of full-precision DDPMs even with codebooks of size $K = 64$ .
Perceptual compression with DDCM produces visually faithful reconstructions at bitrates down to 0.02 BPP, outperforming JPEG/BPG at ultra-low rates.
Adaptive variants recover ROIs with superior clarity and target PSNRs with substantially lower prediction error.

gDDCM yields further improvements, outperforming original DDCM across all quality metrics and unifying token-based compression across diffusion and flow-matching paradigms (Kong, 17 Nov 2025).

7. Limitations, Open Problems, and Outlook

While DDCM-based methods enable efficient, high-fidelity compressed generation, three principal limitations are noted (Vaisman et al., 9 Nov 2025, Kong, 17 Nov 2025):

Turbo-DDCM still requires tens of backbone inferences per image; a single-step, truly zero-latency compression method remains an open problem.
At high BPPs, compression performance is ultimately bounded by the expressiveness and distortion of the upstream latent diffusion encoders.
Theoretical understanding of optimal codebook size, codeword sparsity, and token-allocation schedules in the continuous-to-discrete regime is incomplete and constitutes an active area of research.

Future directions include end-to-end learning of codebook schedules, extension to multimodal and video domains, integration into standardized codecs, and deeper analysis of the relationship between discrete tokenization and score-based posterior sampling.

Key references:

Ohayon et al., "Compressed Image Generation with Denoising Diffusion Codebook Models" (Ohayon et al., 3 Feb 2025)
Ohayon et al., "Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression" (Vaisman et al., 9 Nov 2025)
Ohayon et al., "Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model" (Kong, 17 Nov 2025)