Conditional Denoising Diffusion Implicit Model
- CDDIM is a deterministic extension of DDIM that integrates explicit conditioning to solve inverse, inpainting, and imputation problems.
- It employs mask-based, context embedding, and projection methods to enforce hard or soft constraints during the reverse diffusion process.
- The model achieves significant speed-ups and improved reproducibility across various domains including medical imaging and seismic reconstruction.
A Conditional Denoising Diffusion Implicit Model (CDDIM) is an extension of the Denoising Diffusion Implicit Model (DDIM) framework, incorporating explicit conditioning to solve inverse, inpainting, imputation, or constrained generation problems. CDDIMs replace the stochastic DDPM reverse process with a deterministic or pseudo-deterministic sampler and integrate conditioning signals via architectural, loss-based, or mask-based mechanisms. This allows controllable, fast, and stable conditional sampling in high-dimensional generative tasks across medical imaging, tabular imputation, geoscience, seismic reconstruction, and more. CDDIMs have distinct instantiations in each application domain, but all rely on parameterizing the reverse diffusion trajectory with learned conditional noise predictors and deterministic, non-Markovian backward maps.
1. Diffusion Model Foundations and DDIM Formalism
Denoising diffusion probabilistic models (DDPMs) leverage a forward process where Gaussian noise is added to data in steps:
with noise schedule and cumulative product . The marginal is
DDPMs train a network to predict the noise term added at time via a simple weighted mean-squared error (MSE) loss.
DDIM [implicit] sampling reconstructs data by reversing the diffusion process with a deterministic map:
where . Setting noise injection 0 yields a strictly deterministic sampling trajectory, advantageous for reproducibility and low-variance outputs (Zhou et al., 5 Aug 2025). Subsampling of inference steps further accelerates sampling relative to DDPM.
2. Conditioning Mechanisms in CDDIM
CDDIMs introduce domain-specific conditioning to guide generation toward desired targets or constraints without fundamentally altering the DDIM formulation. Common mechanisms include:
- Mask-based spatial conditioning: Apply lesion, facies, or data-availability masks by concatenation as input channels and/or by partial projection at each sampling step (Zhang et al., 2 Oct 2025, Xu et al., 7 Mar 2026, Wei et al., 2023).
- Direct context embedding: For tabular data, observed features are concatenated, and a binary mask tags observed vs. missing entries (Zhou et al., 5 Aug 2025).
- Projection operators: For measurements, known values, or hard constraints, update each reverse step by projecting intermediate states so that known values are exactly preserved (Xu et al., 7 Mar 2026, Wei et al., 2023, Jayaram et al., 2024).
- Soft constraint enforcement: For inverse problems with noisy measurements, add a Lagrangian penalty or KL-divergence matching term to guide the solution distribution (Jayaram et al., 2024).
The objective in each case is to guarantee that the conditional input or constraint is satisfied either exactly (hard constraint) or distributionally (soft constraint).
3. Algorithms and Architectural Instantiations
The CDDIM framework can be instantiated with network and sampling architectures fitted to the modality and conditional structure:
| Domain | Conditioning Input | Network Backbone | Key Mechanism |
|---|---|---|---|
| Lesion filling/synthesis | Lesion mask, modalities | 2D U-Net w/ time & mask channels | Mask concat + region mixing |
| Tabular imputation | Obs. values, binary mask | Feature-wise Transformer + MLP | Input concat, deterministic |
| Facies/geology simulation | Well mask, facies code | U-Net with conditional input stack | Masked update + concat |
| Seismic trace interpolation | Known trace mask, values | U-Net + self-attention | Hard value projection |
| Noisy linear inverse problems | Forward operator, obs. | Pretrained DDIM net | Projective or Lagrangian sampling |
As an example, in multiple sclerosis lesion filling (MSRepaint), the model predicts noise from the current noised volume and lesion mask. After each DDIM update, the predicted "denoised" image is mixed within selected regions (1) while frozen elsewhere, and a forward step re-injects noise, enforcing boundary continuity (Zhang et al., 2 Oct 2025). For tabular imputation (MissDDIM), the forward process noises only missing entries and the reverse path deterministically imputes, conditioned on observed features and mask (Zhou et al., 5 Aug 2025). In geomodelling (DiffSIM), known wells are masked-in at every reverse step, guaranteeing that sampled geology honors observed facies (Xu et al., 7 Mar 2026).
4. Loss Functions and Training Strategies
All CDDIMs adopt a noise-prediction loss, generically:
2
with possible per-sample or per-voxel weights to focus learning capacity (e.g., on lesions or masked regions) (Zhang et al., 2 Oct 2025). No adversarial or perceptual components are required. In constrained or hard-conditioning CDDIMs, losses are often masked to apply only outside hard constraint regions (Xu et al., 7 Mar 2026).
Practical training enhancements include contrast dropout in imaging (randomly zeroing input modalities), self-masking in tabular data (randomizing pseudo-missing patterns to enable imputation of arbitrary missing structures), and multi-view inversion plus fusion for 3D consistency in volumetric data (Zhang et al., 2 Oct 2025, Zhou et al., 5 Aug 2025).
5. Inference Procedures and Acceleration
Conditional DDIM samplers iteratively denoise noisy data using trained 3, integrating conditional constraints at each time step. For hard constraints, each intermediate reverse is projected:
4
where 5 is the constraint mask (Wei et al., 2023). Projection or mixing is repeated as necessary to ensure constraint satisfaction, and for image domains, multiple orientation passes (axial, coronal, sagittal) can be fused for improved 3D consistency (Zhang et al., 2 Oct 2025).
Inference-step sub-sampling and deterministic updates result in significant acceleration: for example, MSRepaint processes a 6 volume in 3 min (RTX-5000), 7 faster than prior diffusion inpainting methods (Zhang et al., 2 Oct 2025). MissDDIM achieves single-pass imputation at inference vs. 100-fold ensemble aggregation for stochastic DDPMs (Zhou et al., 5 Aug 2025). In CDIM (general linear inverse problems), the deterministic CDDIM sampler costs only 8 network calls (with 9 skip-steps and 0 inner update steps), delivering 1 acceleration compared to legacy conditional samplers (Jayaram et al., 2024).
6. Application Domains and Empirical Performance
CDDIMs have demonstrated strong performance in diverse, domain-specific conditional generation tasks:
- Medical imaging: Bidirectional lesion filling and synthesis with regionwise precision and clinically practical runtime, outperforming classical FSL and NiftySeg and matching FastSurfer-LIT with an order-of-magnitude speedup (Zhang et al., 2 Oct 2025).
- Tabular data imputation: Deterministic imputation with lower RMSE and inference time than CSDI and MissDiff, with output variance reduced to zero and high downstream F1/MAE (Zhou et al., 5 Aug 2025).
- Geostatistical simulation: Geological realizations conditioned on sparse well data preserve both global distributional statistics and hard constraints, enabling between-well interpolation under exact constraint at input locations (Xu et al., 7 Mar 2026).
- Seismic interpolation: Efficient, structure-preserving interpolation of missing seismic traces with coherence-corrected resampling, robust under variable missing patterns, with order-seconds inference at scale (Wei et al., 2023).
- Linear inverse imaging: CDIM enforces hard or soft measurement constraints with minimal cost, yielding FID and LPIPS competitive with state-of-the-art samplers while running 20–50× faster (Jayaram et al., 2024).
Key factors include the ability to condition explicitly (masks, observed values), deterministic inference for repeatability, and fast low-step sampling via DDIM sub-sampling.
7. Theoretical Guarantees, Limitations, and Extensions
The deterministic nature of CDDIM reveals several analytical and practical properties:
- Exact constraint satisfaction: For noiseless problems and linear constraints, CDDIMs can achieve exact recovery as 2—the projection becomes affine and yields the constrained solution in the limit (Jayaram et al., 2024). With sufficient inner optimization steps, this can be realized numerically.
- Stable, low-variance outputs: Absence of stochasticity in the reverse map ensures high repeatability and reduces the need for ensemble or median voting (Zhou et al., 5 Aug 2025).
- Flexibility in conditioning: Depending on architecture, CDDIMs can interpolate between unconditional, weakly conditional (soft constraint), and hard-constraint regimes without retraining the underlying DDIM network (Jayaram et al., 2024).
- Assumptions and limitations: Core limitations include the need for linear constraints for exact projection and that CDDIM projection is not, in general, applicable to nonlinear observations. Conditioning via projection or masking is not universally optimal for all domains; for example, generative fidelity may trade off with degree of constraint satisfaction near boundaries.
- Extension to noisy, multimodal, or structured inverse problems: Lagrangian or KL-based approaches in CDDIM permit handling of nontrivial noise models or distributional constraints, further broadening the scope of tasks addressable by conditional diffusion (Jayaram et al., 2024).
CDDIM presents a unifying, efficient, and generalizable paradigm for deterministic conditional generation across data modalities, enabling practical and scalable deployment in domains requiring precise spatiotemporal or feature-wise control.