Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional Denoising Diffusion Implicit Model

Updated 15 June 2026
  • CDDIM is a deterministic extension of DDIM that integrates explicit conditioning to solve inverse, inpainting, and imputation problems.
  • It employs mask-based, context embedding, and projection methods to enforce hard or soft constraints during the reverse diffusion process.
  • The model achieves significant speed-ups and improved reproducibility across various domains including medical imaging and seismic reconstruction.

A Conditional Denoising Diffusion Implicit Model (CDDIM) is an extension of the Denoising Diffusion Implicit Model (DDIM) framework, incorporating explicit conditioning to solve inverse, inpainting, imputation, or constrained generation problems. CDDIMs replace the stochastic DDPM reverse process with a deterministic or pseudo-deterministic sampler and integrate conditioning signals via architectural, loss-based, or mask-based mechanisms. This allows controllable, fast, and stable conditional sampling in high-dimensional generative tasks across medical imaging, tabular imputation, geoscience, seismic reconstruction, and more. CDDIMs have distinct instantiations in each application domain, but all rely on parameterizing the reverse diffusion trajectory with learned conditional noise predictors and deterministic, non-Markovian backward maps.

1. Diffusion Model Foundations and DDIM Formalism

Denoising diffusion probabilistic models (DDPMs) leverage a forward process where Gaussian noise is added to data in TT steps:

q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),

with noise schedule {βt}\{\beta_t\} and cumulative product αt=i=1t(1βi)\alpha_t = \prod_{i=1}^t(1-\beta_i). The marginal is

q(xtx0)=N(xt;αtx0,(1αt)I).q(x_t|x_0) = \mathcal{N}(x_t; \sqrt{\alpha_t} x_0,\, (1-\alpha_t) I).

DDPMs train a network ϵθ(xt,t)\epsilon_\theta(x_t, t) to predict the noise term ϵ\epsilon added at time tt via a simple weighted mean-squared error (MSE) loss.

DDIM [implicit] sampling reconstructs data by reversing the diffusion process with a deterministic map:

xt1=αt1x^0(xt,t)+1αt1ϵθ(xt,t)x_{t-1} = \sqrt{\alpha_{t-1}} \hat x_0(x_t, t) + \sqrt{1 - \alpha_{t-1}}\, \epsilon_\theta(x_t, t)

where x^0(xt,t)=xt1αtϵθ(xt,t)αt\hat x_0(x_t, t) = \frac{x_t - \sqrt{1-\alpha_t}\, \epsilon_\theta(x_t, t)}{\sqrt{\alpha_t}}. Setting noise injection q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),0 yields a strictly deterministic sampling trajectory, advantageous for reproducibility and low-variance outputs (Zhou et al., 5 Aug 2025). Subsampling of inference steps further accelerates sampling relative to DDPM.

2. Conditioning Mechanisms in CDDIM

CDDIMs introduce domain-specific conditioning to guide generation toward desired targets or constraints without fundamentally altering the DDIM formulation. Common mechanisms include:

The objective in each case is to guarantee that the conditional input or constraint is satisfied either exactly (hard constraint) or distributionally (soft constraint).

3. Algorithms and Architectural Instantiations

The CDDIM framework can be instantiated with network and sampling architectures fitted to the modality and conditional structure:

Domain Conditioning Input Network Backbone Key Mechanism
Lesion filling/synthesis Lesion mask, modalities 2D U-Net w/ time & mask channels Mask concat + region mixing
Tabular imputation Obs. values, binary mask Feature-wise Transformer + MLP Input concat, deterministic
Facies/geology simulation Well mask, facies code U-Net with conditional input stack Masked update + concat
Seismic trace interpolation Known trace mask, values U-Net + self-attention Hard value projection
Noisy linear inverse problems Forward operator, obs. Pretrained DDIM net Projective or Lagrangian sampling

As an example, in multiple sclerosis lesion filling (MSRepaint), the model predicts noise from the current noised volume and lesion mask. After each DDIM update, the predicted "denoised" image is mixed within selected regions (q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),1) while frozen elsewhere, and a forward step re-injects noise, enforcing boundary continuity (Zhang et al., 2 Oct 2025). For tabular imputation (MissDDIM), the forward process noises only missing entries and the reverse path deterministically imputes, conditioned on observed features and mask (Zhou et al., 5 Aug 2025). In geomodelling (DiffSIM), known wells are masked-in at every reverse step, guaranteeing that sampled geology honors observed facies (Xu et al., 7 Mar 2026).

4. Loss Functions and Training Strategies

All CDDIMs adopt a noise-prediction loss, generically:

q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),2

with possible per-sample or per-voxel weights to focus learning capacity (e.g., on lesions or masked regions) (Zhang et al., 2 Oct 2025). No adversarial or perceptual components are required. In constrained or hard-conditioning CDDIMs, losses are often masked to apply only outside hard constraint regions (Xu et al., 7 Mar 2026).

Practical training enhancements include contrast dropout in imaging (randomly zeroing input modalities), self-masking in tabular data (randomizing pseudo-missing patterns to enable imputation of arbitrary missing structures), and multi-view inversion plus fusion for 3D consistency in volumetric data (Zhang et al., 2 Oct 2025, Zhou et al., 5 Aug 2025).

5. Inference Procedures and Acceleration

Conditional DDIM samplers iteratively denoise noisy data using trained q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),3, integrating conditional constraints at each time step. For hard constraints, each intermediate reverse is projected:

q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),4

where q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),5 is the constraint mask (Wei et al., 2023). Projection or mixing is repeated as necessary to ensure constraint satisfaction, and for image domains, multiple orientation passes (axial, coronal, sagittal) can be fused for improved 3D consistency (Zhang et al., 2 Oct 2025).

Inference-step sub-sampling and deterministic updates result in significant acceleration: for example, MSRepaint processes a q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),6 volume in 3 min (RTX-5000), q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),7 faster than prior diffusion inpainting methods (Zhang et al., 2 Oct 2025). MissDDIM achieves single-pass imputation at inference vs. 100-fold ensemble aggregation for stochastic DDPMs (Zhou et al., 5 Aug 2025). In CDIM (general linear inverse problems), the deterministic CDDIM sampler costs only q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),8 network calls (with q(xtxt1)=N(xt;1βtxt1,βtI),q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1},\, \beta_t I),9 skip-steps and {βt}\{\beta_t\}0 inner update steps), delivering {βt}\{\beta_t\}1 acceleration compared to legacy conditional samplers (Jayaram et al., 2024).

6. Application Domains and Empirical Performance

CDDIMs have demonstrated strong performance in diverse, domain-specific conditional generation tasks:

  • Medical imaging: Bidirectional lesion filling and synthesis with regionwise precision and clinically practical runtime, outperforming classical FSL and NiftySeg and matching FastSurfer-LIT with an order-of-magnitude speedup (Zhang et al., 2 Oct 2025).
  • Tabular data imputation: Deterministic imputation with lower RMSE and inference time than CSDI and MissDiff, with output variance reduced to zero and high downstream F1/MAE (Zhou et al., 5 Aug 2025).
  • Geostatistical simulation: Geological realizations conditioned on sparse well data preserve both global distributional statistics and hard constraints, enabling between-well interpolation under exact constraint at input locations (Xu et al., 7 Mar 2026).
  • Seismic interpolation: Efficient, structure-preserving interpolation of missing seismic traces with coherence-corrected resampling, robust under variable missing patterns, with order-seconds inference at scale (Wei et al., 2023).
  • Linear inverse imaging: CDIM enforces hard or soft measurement constraints with minimal cost, yielding FID and LPIPS competitive with state-of-the-art samplers while running 20–50× faster (Jayaram et al., 2024).

Key factors include the ability to condition explicitly (masks, observed values), deterministic inference for repeatability, and fast low-step sampling via DDIM sub-sampling.

7. Theoretical Guarantees, Limitations, and Extensions

The deterministic nature of CDDIM reveals several analytical and practical properties:

  • Exact constraint satisfaction: For noiseless problems and linear constraints, CDDIMs can achieve exact recovery as {βt}\{\beta_t\}2—the projection becomes affine and yields the constrained solution in the limit (Jayaram et al., 2024). With sufficient inner optimization steps, this can be realized numerically.
  • Stable, low-variance outputs: Absence of stochasticity in the reverse map ensures high repeatability and reduces the need for ensemble or median voting (Zhou et al., 5 Aug 2025).
  • Flexibility in conditioning: Depending on architecture, CDDIMs can interpolate between unconditional, weakly conditional (soft constraint), and hard-constraint regimes without retraining the underlying DDIM network (Jayaram et al., 2024).
  • Assumptions and limitations: Core limitations include the need for linear constraints for exact projection and that CDDIM projection is not, in general, applicable to nonlinear observations. Conditioning via projection or masking is not universally optimal for all domains; for example, generative fidelity may trade off with degree of constraint satisfaction near boundaries.
  • Extension to noisy, multimodal, or structured inverse problems: Lagrangian or KL-based approaches in CDDIM permit handling of nontrivial noise models or distributional constraints, further broadening the scope of tasks addressable by conditional diffusion (Jayaram et al., 2024).

CDDIM presents a unifying, efficient, and generalizable paradigm for deterministic conditional generation across data modalities, enabling practical and scalable deployment in domains requiring precise spatiotemporal or feature-wise control.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Denoising Diffusion Implicit Model (CDDIM).