Papers
Topics
Authors
Recent
2000 character limit reached

Denoising Diffusion Codebook Models

Updated 24 November 2025
  • DDCM are generative models that combine denoising diffusion processes with discrete Gaussian noise codebooks to enable deterministic, index-based representations.
  • The models replace continuous noise with selected codebook entries to achieve state-of-the-art perceptual image compression and high-fidelity synthesis.
  • Variants like Turbo-DDCM and gDDCM accelerate encoding and extend applications to conditional generation and image restoration with improved runtime and compression efficiency.

Denoising Diffusion Codebook Models (DDCM) are a class of generative modeling and data compression architectures that combine denoising diffusion probabilistic models (DDPMs) with finite, discrete codebooks of Gaussian noise vectors. By substituting continuous random noise in the reverse diffusion process with selected elements from these codebooks, DDCMs enable deterministic, index-based representations of generative trajectories, thus achieving both high-fidelity image generation and state-of-the-art perceptual image compression. This approach generalizes across mainstream diffusion backbones and can flexibly adapt to conditional or task-specific settings (Ohayon et al., 3 Feb 2025, Vaisman et al., 9 Nov 2025, Kong, 17 Nov 2025, Chen et al., 26 Jul 2025).

1. Standard DDCM Framework and Mathematical Formulation

DDCM is rooted in the diffusion modeling paradigm. The forward process gradually corrupts a data sample x0x_0 via a Markov chain of Gaussian transitions:

q(xtxt1)=N(xt;αtxt1,βtI),αt=1βt,q(x_t|x_{t-1}) = \mathcal{N}\bigl(x_t; \sqrt{\alpha_t}\,x_{t-1},\,\beta_t I\bigr), \quad \alpha_t = 1 - \beta_t,

with q(xtx0)=N(xt;αˉtx0, (1αˉt)I)q(x_t|x_0) = \mathcal{N}\bigl(x_t; \sqrt{\bar{\alpha}_t}\, x_0,\ (1-\bar{\alpha}_t) I \bigr), and αˉt=s=1tαs\bar{\alpha}_t = \prod_{s=1}^t \alpha_s.

The reverse process, parameterized by a neural network ϵθ(xt,t)\epsilon_\theta(x_t, t) trained via simplified denoising score matching,

L(θ)=Et,x0,ϵϵϵθ(xt,t)2,xt=αˉtx0+1αˉtϵ,\mathcal{L}(\theta) = \mathbb{E}_{t, x_0, \epsilon}\, \|\epsilon - \epsilon_\theta(x_t, t)\|^2,\quad x_t = \sqrt{\bar{\alpha}_t}\,x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon,

iteratively reconstructs x0x_0:

xt1=1αt(xt1αt1αˉtϵθ(xt,t))+σtz,zN(0,I).x_{t-1} = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}}\,\epsilon_\theta(x_t, t)\right) + \sigma_t\,z,\quad z\sim\mathcal{N}(0,I).

In DDCM, the standard Gaussian noise zz is replaced at each timestep by indexed codebook entries:

xt1=1αt(xt1αt1αˉtϵθ(xt,t))+σtet(kt),et(k)N(0,I),x_{t-1} = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{1-\alpha_t}{\sqrt{1-\bar{\alpha}_t}}\,\epsilon_\theta(x_t, t)\right) + \sigma_t\,e_t^{(k_t)},\quad e_t^{(k)} \sim \mathcal{N}(0,I),

where {et(1),,et(K)}\{e_t^{(1)},\ldots,e_t^{(K)}\} forms the codebook CtC_t at timestep tt.

Image encoding is performed by picking, at every tt, the ktk_t that best aligns et(kt)e_t^{(k_t)} with the current residual or via a task-dependent discrepancy L(c,xt,et(k))L(c, x_t, e_t^{(k)}), transforming the codebook indices {kt}\{k_t\} into a losslessly decoded bitstream (Ohayon et al., 3 Feb 2025).

2. Algorithmic Structure and Compression Protocol

A DDCM encoder/decoder cycle operates as follows:

  • At compression (“encoding”), for a given real image x0x_0:

    • The reverse diffusion chain is simulated from t=Tt=T to t=1t=1.
    • At each step, the codebook index

    kt=argmaxk[K]et(k),x0x^0t(xt)k_t^* = \arg\max_{k\in[K]} \langle e_t^{(k)},\, x_0 - \hat{x}_{0|t}(x_t)\rangle

    is chosen, where x^0t(xt)\hat{x}_{0|t}(x_t) is the MMSE estimate of x0x_0 at time tt given xtx_t (via the denoising network). - The sequence {kT,,k1}\{k_T, \ldots, k_1\} forms the compressed bitstream, with each index using log2K\log_2 K bits; total bpp is thus Tlog2K/NpixT\log_2 K / N_{\rm pix}.

  • At decompression (“decoding”), the same diffusion process is unrolled with the codebook entries et(kt)e_t^{(k_t)} added at each step, perfectly reconstructing the sample deterministically.

Lossy and lossless regimes are determined by KK, TT, and codebook subset selections. Matching-pursuit can be used for multi-atom representation per timestep, increasing rate and fidelity (Ohayon et al., 3 Feb 2025, Vaisman et al., 9 Nov 2025).

3. Turbo-DDCM: Acceleration and Flexible Encoding

Turbo-DDCM augments DDCM with computational accelerations and enhanced bitstream efficiency (Vaisman et al., 9 Nov 2025). Instead of greedy iterative matching pursuit, a closed-form sparse thresholding is employed:

  1. For each tt, project the residual rtr_t onto each codebook element ct(k)c_t^{(k)} to obtain correlations αi\alpha_i.
  2. Select the top MM atoms by αi|\alpha_i|; coefficients are signed and quantized to a small set (e.g., ±1\pm1).
  3. The composite noise is

z~t=Ztwstd(Ztw),\tilde{z}_t = \frac{Z_t w}{\mathrm{std}(Z_t w)},

where ZtZ_t is the codebook matrix and ww is the sparse coefficient vector.

  1. At each step, transmit the lexicographic index of the unordered MM-subset (among all (KM)\binom{K}{M}) and the quantized signs, realizing a more compact bitstream.

This yields substantial runtime speedups (up to 40×40\times vs. DDCM) and a $40$–50%50\% rate savings by avoiding redundant or sequential atom selection. Experiments on datasets such as Kodak24 and DIV2K demonstrate near-constant runtime per image (1.5 s on a single A40 GPU) and best-in-class perceptual metrics at low bitrates (Vaisman et al., 9 Nov 2025).

Further, Turbo-DDCM introduces:

  • Priority-aware (ROI) compression, where input spatial “importance” maps reweight residuals, focusing bits in user-specified regions.
  • Distortion-controlled compression, employing a trained predictor to select encoding rate for a desired PSNR, reducing PSNR-targeting RMSE by 40%.

4. Generalization to gDDCM and Alternative Diffusion Frameworks

The Generalized Denoising Diffusion Codebook Model (gDDCM) (Kong, 17 Nov 2025) unifies DDCM-style tokenization across diverse diffusion models, including DDPM, score-based SDEs, consistency models, and Rectified Flow.

All these models share the marginal form:

xt=s(t)x0+σ(t)ϵ,ϵN(0,I),x_t = s(t) x_0 + \sigma(t) \epsilon,\quad \epsilon \sim \mathcal{N}(0,I),

for known schedules s(t),σ(t)s(t), \sigma(t).

gDDCM alternates between deterministic reverse steps (via ODE or Euler integration of model-specific flows) and a partial noising step that injects discretized noise tokens from codebooks. The process is parameterized by p[0,1]p \in [0,1], interpolating between no reinforcement noise (p=0p=0) and the original DDCM (p=0.5p=0.5 in DDPM). The noise at each tokenization step is chosen by proximity in the noise space to the ODE-inferred increment (Kong, 17 Nov 2025).

Empirically, p=0p=0 offers optimal LPIPS, FID, and SSIM across all tested model classes, with K=16K=16–$64$ codebook entries sufficing for high-fidelity reconstructions. gDDCM confirms the extensibility of codebook-based compression to all major diffusion model variants—DDIM, EDM, Consistency Models, and ReFlow—retaining, or improving upon, standard DDCM performance.

5. Applications in Conditional Generation and Image Restoration

DDCM and its variants are naturally extensible to conditional and restoration tasks, using codebook index selection rules rooted in task-specific objectives. The loss L(y,xt,et(k))L(y, x_t, e_t^{(k)}) generalizes the codebook index choice, supporting settings such as:

  • Zero-shot Inverse Problems: For super-resolution or colorization, LL is typically a squared loss against the observed low-quality image or its features.
  • Blind Real-world Face Restoration: Index selection balances mean-squared error to an MMSE estimate with random diversity, optimizing for perceptual–distortion trade-offs via no-reference IQA measures.
  • Compressed Conditional/Class Guidance: The loss incorporates conditional distributions or classifier guidance (CG, CFG), enabling parallel compressed output and guidance-driven synthesis (Ohayon et al., 3 Feb 2025).

In medical image restoration, systems such as DiffCode (Chen et al., 26 Jul 2025) integrate DDCM concepts with vector-quantized codebook priors and a latent diffusion module. The architecture employs task-adaptive codebook banks, residual quantization, and conditional latent denoising to achieve competitive PSNR and SSIM across heterogenous restoration tasks (MRI super-resolution, CT denoising, PET synthesis), with average performance gains over strong baselines.

6. Empirical Results, Ablations, and Limitations

Empirical evaluations establish DDCM and its derivatives as state-of-the-art in perceptual image compression at low bitrates. On Kodak24, DIV2K, CLIC2020, and ImageNet256, DDCM achieves superior FID and LPIPS compared to BPG, HiFiC, PSC, PerCo, and other codecs at ≈0.1 BPP (Ohayon et al., 3 Feb 2025). Turbo-DDCM attains comparable or better LPIPS/FID than custom CUDA implementations and outperforms prior zero-shot methods, especially in perceptual quality and speed (Vaisman et al., 9 Nov 2025).

Ablation studies indicate:

  • Lower codebook size KK and fewer diffusion steps TT reduce bitrates at some perceptual cost.
  • Thresholding-based multi-atom selection in Turbo-DDCM surpasses matching-pursuit in quality/runtime trade-off.
  • Lexicographic bitstream encoding is critical for rate efficiency.
  • gDDCM consistently attains best metrics for p=0p=0 across different diffusion backbones (Kong, 17 Nov 2025).

Limitations include the need for iterative reverse denoising (one-step zero-shot compression remains unresolved), dependencies on pretrained diffusion backbones, and the lack of a comprehensive rate-distortion-theoretic understanding under the diffusion prior.

7. Theoretical Interpretation and Future Directions

The codebook noise selection in DDCM can be viewed, in the infinite codebook limit (KK\rightarrow\infty), as discretizing the probability-flow ODE corresponding to the conditional or unconditional reverse process. This forms a bridge between discrete entropy-coded diffusion trajectories and continuous posterior sampling under generative diffusion priors (Ohayon et al., 3 Feb 2025). The deterministic nature of codebook-based reverse chains also enables precise and reproducible reconstruction for compression and restoration tasks.

Open directions identified include:

  • One-step, non-iterative zero-shot compression.
  • Improved latent diffusion models to surpass the encoder–decoder distortion bound at high BPP.
  • Development of rate–distortion theory under DDCM/gDDCM frameworks.

Overall, DDCM and its generalizations offer a principled route to inject discrete, index-based information control into generative diffusion frameworks, with broad implications for compressed generation, flexible conditional modeling, and efficient, task-adaptive restoration (Ohayon et al., 3 Feb 2025, Vaisman et al., 9 Nov 2025, Kong, 17 Nov 2025, Chen et al., 26 Jul 2025).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Denoising Diffusion Codebook Models (DDCM).