Score-Based Models: Generative & Codebook Methods

Updated 24 November 2025

Score-based models are generative frameworks that leverage score functions—the gradients of log-density—to guide tasks like generation, restoration, and compression.
They incorporate methods such as DDPM, SDE-based frameworks, and DDCMs, enabling deterministic codebook tokenization and compressed representations.
Advanced variants like gDDCM and Turbo-DDCM achieve high-fidelity reconstructions and significant runtime speedups, demonstrating improved FID, LPIPS, and SSIM metrics on image benchmarks.

Score-based models constitute a class of generative models that use score functions—parameterizations of the gradient of the log-density of data distributions—to guide generation, restoration, and compression tasks via stochastic differential equations or discrete iterative denoising procedures. These models subsume denoising diffusion probabilistic models (DDPMs), stochastic differential equation (SDE)-based frameworks, and are now unified with discrete codebook approaches such as Denoising Diffusion Codebook Models (DDCMs) and their generalizations. Their recent evolution has introduced explicit codebook-driven tokenizations, enabling compressed generation and flexible downstream applications while retaining the expressiveness and theoretical guarantees of continuous score matching.

1. Foundational Architecture of Score-Based Models

Score-based generative frameworks derive their core principle from estimating the score function $\nabla_{x} \log p(x)$ , which serves as a parameterization of the local structure of the data distribution. Early approaches implement this via denoising-based stochastic processes, either in a discrete-time fashion (as in DDPM) or in continuous time through SDEs and ODEs. The reversible diffusion process is defined by forward noising (adding Gaussian noise at each step) and reverse denoising (synthesizing or reconstructing data by removing noise based on a learned denoiser or score network).

A standard forward diffusion step for clean data $x_0$ is given by:

$x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1 - \bar\alpha_t}\, \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$

where $\{ \alpha_t \}$ is a decreasing noise schedule and $\bar\alpha_t$ the corresponding cumulative product. The reverse process produces $x_{t-1}$ from $x_t$ either through an explicit score-based sampler or as a step of a learned denoising function with additive noise.

2. Extension: Denoising Diffusion Codebook Models (DDCMs) and Generalizations

DDCMs modify the conventional continuous noise injection in the reverse process by replacing Gaussian noise at each denoising step with a token drawn from a finite, fixed codebook of iid Gaussian vectors. This approach allows the noise at every backward step, $x_{t-1} = \mu_\theta(x_t,t) + \sigma_t \varepsilon$ , to be deterministically selected from a codebook $\mathcal{C}_t = \{z_t^{(k)}\}$ , rather than sampled afresh, based on a selection rule tailored for the specific task (e.g., reconstruction error minimization or conditional generation) (Ohayon et al., 3 Feb 2025). The entire sequence of codebook indices forms a compressed representation or bit-stream encoding of the generated or compressed data.

Empirical findings demonstrate that very small codebooks (e.g., $K=64$ ) minimally affect perceptual quality and FID measures for unconditional generation and compression tasks, with less than 5% degradation compared to full Gaussian-noise samplers (Ohayon et al., 3 Feb 2025).

Codebook token selection for compression is formally:

$k_t = \arg\max_{k \in \{1, \dots, K\}} \langle z_t^{(k)}, x_0 - \hat{x}_{0|t} \rangle$

where $\hat{x}_{0|t}$ is the MMSE estimate of $x_0$ given $x_t$ , guiding the denoising toward better reconstruction.

3. Generalized Denoising Diffusion Codebook Model (gDDCM) and Expansion to SDEs, ODEs, and Flow Matching

DDCMs were initially limited to discrete-time DDPM dynamics; however, the introduction of the Generalized Denoising Diffusion Codebook Model (gDDCM) extends this compression and tokenization methodology to all major diffusion variants, including continuous-time score-based SDEs, Consistency Models, and Rectified Flow (flow matching approaches) (Kong, 17 Nov 2025).

The critical abstraction is that, for all these models, the marginal at time $t$ can be written as:

$x_t = s(t) x_0 + \sigma(t) \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)$

This structural unification allows a two-step procedure in gDDCM: (1) propagating deterministically with an ODE-style update, and (2) re-injecting codebook noise via a selection rule. The interpolation parameter $p \in [0, 1]$ tunes between ODE reversal and direct codebook reinsertion, with $p=0$ yielding maximum compression. Ablation studies show $p=0$ consistently gives the best LPIPS and FID on real image datasets (Kong, 17 Nov 2025).

gDDCM outperforms baseline DDCM and standard diffusion samplers without tokenization on image reconstruction, achieving FID scores as low as 3.2 on CIFAR-10, with similar improvements for LPIPS and SSIM on both CIFAR-10 and LSUN Bedroom benchmarks.

Model	p	FID ↓	LPIPS ↓	IS ↑	SSIM ↑
DDPM	0.5†	7.7	0.138	9.67	0.93
DDPM (gDDCM)	0.0	3.2⋆	0.060★	10.50‡	0.98‡
EDM	0.5	4.5	0.099	10.3	0.95
EDM (gDDCM)	0.0	4.3	0.078	10.9	0.96

⋆ best overall; ★ best overall; ‡ second best (Kong, 17 Nov 2025).

The generalization to continuous models enables sampling with modern ODE and SDE solvers, benefiting from state-of-the-art denoising and acceleration techniques.

4. Algorithmic Efficiency: Turbo-DDCM and Multi-Atom Compression

While DDCM codebookization enables compression, sequential selection of single atoms per step requires hundreds of denoising steps per sample, incurring high computational cost. Turbo-DDCM addresses this inefficiency by allowing a sparse combination of multiple (up to hundreds) codebook atoms at each step, approximating the residual faster (Vaisman et al., 9 Nov 2025). The combinatorial code assignment is solved via thresholding on atom–residual inner products and a quantized coefficient search, resulting in dramatically reduced denoising steps (down to $T \approx 20$ ), a $\times 40$ –$50$ runtime speedup compared to original DDCM, while preserving or improving rate–distortion–perception tradeoffs.

The encoding protocol efficiently transmits atom indices and coefficients as bit streams, offering bitrate control and region-of-interest or distortion-based adaptive compression (Vaisman et al., 9 Nov 2025).

Method	Runtime (s/img)	PSNR/LPIPS/FID (Kodak, 0.07–0.2 BPP)
DDCM	~65	As in Table 2 (Ohayon et al., 3 Feb 2025)
Turbo-DDCM	~1.5	Matches/slightly improves DDCM

Turbo-DDCM supports plug-in use with any pre-trained diffusion backbone and introduces region and distortion control via modified coefficients and regression on JPEG file size (Vaisman et al., 9 Nov 2025).

5. Compressed Conditional Generation, Restoration, and Posterior Approximation

The codebook-based frameworks naturally allow for compressed conditional generation and restoration, by modifying the codebook token selection to minimize task-specific losses at each step. For inverse problems such as colorization, super-resolution, and blind face restoration, token assignments can be optimized for consistency with measurements or perceptual quality metrics, yielding compressed bitstream outputs alongside high-fidelity reconstructed images (Ohayon et al., 3 Feb 2025).

A key theoretical result is that, as codebook size $K\to\infty$ , token selection by proximity to the posterior score $\nabla_{x_t} \log p(x_t|y)$ yields reverse updates that converge to the probability-flow ODE of the true posterior. This connects the codebook-augmented diffusion framework to continuous score-based Bayesian inference (Ohayon et al., 3 Feb 2025).

6. Extensions: Task-Adaptive Latent Codebooks and Medical Restoration

Recent models such as DiffCode extend vector-quantized diffusion priors into the domain of medical image restoration via latent diffusion-enhanced codebook priors and residual quantization (Chen et al., 26 Jul 2025). In these frameworks, a bank of task-adaptive codebooks (e.g., one per modality or restoration problem) is coupled with a latent-space diffusion model to refine feature retrieval, aligned with the ground-truth prior. This compensates for diverse information loss across tasks and enables integration in all-in-one restoration models.

DiffCode achieves superior metrics—e.g., 34.4605 dB PSNR, 0.9121 SSIM in low-dose CT denoising versus prior state of the art—by combining task-adaptive VQ codebook priors, lightweight latent diffusion, and a final restoration backbone. The approach demonstrates that iterative diffusion not only enables generative synthesis but refines retrieval in heterogenous, high-complexity inverse problems (Chen et al., 26 Jul 2025).

7. Future Directions and Limitations

gDDCM and Turbo-DDCM still require careful hyperparameter search for optimal tokenization schedules, and the iterative nature, although improved, presents a tradeoff against one-step or distilled schemes. Classical DDCM is less amenable to continuous-time and accelerated sampling, necessitating the generalized and multi-atom methodologies.

Open challenges include adaptation to very high-resolution images, design of non-Gaussian or learned-codebook priors, and efficient transfer to conditional and multimodal tasks such as text-to-image generation. Hierarchical, adaptive, or hybrid codebook/diffusion structures and further one-step zero-shot analogues represent active areas for extension (Kong, 17 Nov 2025).

Score-based models, particularly in codebook-augmented and generalized forms, have established a unified and tunable interface for compressed generative modeling, restoration, and conditional sampling, combining rigorous probabilistic theory with practical compression and task-driven flexibility (Kong, 17 Nov 2025, Vaisman et al., 9 Nov 2025, Ohayon et al., 3 Feb 2025, Chen et al., 26 Jul 2025).