Papers
Topics
Authors
Recent
Search
2000 character limit reached

DiffMD: Denoising Diffusion Models

Updated 13 May 2026
  • DiffMD are generative models that iteratively remove known noise from input data, yielding robust representation learning and high-quality sample generation.
  • They couple a fixed forward Gaussian noising process with a parameterized reverse Markov process using neural networks to stably learn uncertainty and data distributions.
  • DiffMD extend to diverse domains such as fluid dynamics, imaging, and molecular simulations, offering benefits like improved quantification and flexible conditioning.

Denoising Diffusion Models (DiffMD) are a family of generative and predictive models based on iterative, probabilistically-grounded denoising. These models stochastically perturb an input (e.g., a clean image, fluid field, or molecular structure) using a known (often Gaussian) noise process, and train neural networks to reverse the degradation, thereby learning a rich representation of the underlying data manifold. Such models have established new standards for sample quality, uncertainty quantification, and application flexibility across a growing array of scientific and engineering domains.

1. Core Mathematical Formulation

Denoising diffusion models construct generative models by coupling a parameterized reverse Markov process with a fixed forward noising process. Given observed data x0x_0, the forward chain is typically defined as:

  • Forward (noising) process:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)

with a cumulative schedule αt=1βt\alpha_t = 1 - \beta_t, α^t=i=1tαi\hat{\alpha}_t = \prod_{i=1}^t \alpha_i, resulting in the closed-form marginal

q(xtx0)=N(xt;α^tx0,(1α^t)I)q(x_t | x_0) = \mathcal{N}\left(x_t; \sqrt{\hat{\alpha}_t}\,x_0,\, (1-\hat{\alpha}_t) I\right)

as utilized by FluidDiff for spatiotemporal prediction (Yang et al., 2023), and in most canonical image models.

  • Reverse (denoising) process:

pθ(xt1xt,y)=N(xt1;μθ(xt,t,y),σt2I)p_\theta(x_{t-1} | x_t, y) = \mathcal{N}\Bigl(x_{t-1};\, \mu_\theta(x_t, t, y),\, \sigma_t^2 I \Bigr)

where μθ\mu_\theta is predicted via a neural network from the noisy state, diffusion step, and optional context yy (e.g., initial condition, time, or conditioning variables). The canonical mean is related to noise-prediction:

μθ(xt,t,y)=1αt(xt1αt1α^tεθ(xt,t,y))\mu_\theta(x_t, t, y) = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{1 - \alpha_t}{\sqrt{1 - \hat{\alpha}_t}}\,\varepsilon_\theta(x_t, t, y)\right)

as in (Yang et al., 2023).

DiffMDs are trained via simplified evidence lower bound (ELBO) objectives that, for Gaussian cases, reduce to

L(θ)=Ex0,ϵ,tϵϵθ(α^tx0+1α^tϵ,t/T,y)22L(\theta) = \mathbb{E}_{x_0, \epsilon, t} \bigl\| \epsilon - \epsilon_\theta(\sqrt{\hat{\alpha}_t}x_0 + \sqrt{1 - \hat{\alpha}_t}\epsilon, t/T, y) \bigr\|_2^2

A variety of architectures (e.g., U-Nets, transformer-augmented U-Nets) and noise schedules are used depending on application (Yang et al., 2023, Zhang et al., 2023, Permenter et al., 2023).

2. Sampling and Inference Algorithms

DiffMD sampling inverts the forward process using either stochastic (Langevin-style) or deterministic (DDIM-style) updates:

  • Stochastic (Ancestral) sampling: Iteratively samples q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)0 from the learned posterior given q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)1 and random noise, as in

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)2

for q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)3, q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)4 specified by the scheduler (Yang et al., 2023).

  • Deterministic sampling (DDIM/ODE): Removes sampling noise:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)5

and can be recast as a discretized ODE, as in (Zhang et al., 2023). This is further enhanced by the quarter-circular reparameterization, which improves numeric stability and enables high-order solvers.

3. Extensions to Complex Domains and Advanced Variants

Denoising diffusion frameworks have been extended well beyond the conventional Euclidean, image-based generative setting:

  • Physical fields and PDEs: FluidDiff predicts nonlinear fluid fields from high-dimensional simulation data, learning the conditional dynamics without explicit physics priors. Its neural architecture uses U-Net blocks enhanced with time embeddings, self-attention, and explicit conditioning to forecast flow states. The model outperformed non-physics-informed neural baselines in short-term velocity-field prediction and generalized well to unseen initial conditions (Yang et al., 2023).
  • Post-training quantization: AccuQuant addresses quantization in diffusion models by simulating error accumulation over multiple denoising steps and introducing grouped-step calibration, reducing memory complexity from q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)6 to q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)7. It achieves state-of-the-art FID-to-full-precision scores under low-bit quantization by explicitly aligning denoiser output distributions over multi-step groups (Lee et al., 23 Oct 2025).
  • Optimization and projection perspectives: Diffusion model denoisers can be interpreted as projection operators under the manifold hypothesis, with deterministic sampling resembling inexact gradient descent on the squared distance to the data manifold. Two-point gradient estimation samplers exploit this, yielding significant FID gains at low step counts relative to DDIM and related fast samplers (Permenter et al., 2023).
  • Domain-specific modifications: Linear interpolation between clean and real noisy images replaces classical Gaussian forward noising for robust real-world denoising (Yang et al., 2023). Patch masking (Masked Diffusion) replaces additive noise for self-supervised representation learning, enhancing downstream performance in segmentation tasks (Pan et al., 2023).
  • Parameterization and stability improvements: Quarter-circular reparameterization (using q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)8) eliminates endpoint singularities and facilitates the deployment of high-order ODE solvers for faster and more stable sampling (Zhang et al., 2023).
  • Inference-time realignment: DeRaDiff provides a mechanism for continuous control of preference/KL-regularization strength during sampling via geometric mixtures of per-step posteriors, removing the need for multiple retrainings for hyperparameter sweeps (Manujith et al., 28 Jan 2026).

4. Architectural and Algorithmic Considerations

Architectures for DiffMD are highly domain-adapted:

  • U-Net-based backbones: FluidDiff employs a four-scale U-Net with residual, group norm, SiLU activation, and Transformer-style self-attention in block structure. Time is encoded via sinusoidal positional embeddings projected through MLPs (Yang et al., 2023).
  • Expanded conditioning: Inputs may include not only the noised signal but also physically meaningful conditioning maps (e.g., initial field, target time, auxiliary feature buffers).
  • Advanced embedding and ensembling: To permit real data with arbitrary noise models as input, methods like DMID introduce adaptive embeddings (e.g., VAE to match real noise to AWGN) and adaptive ensembling to balance perceptual quality and distortion (Li et al., 2023).
  • Self-supervised and masked objectives: Training losses extend beyond MSE to robust alternatives such as the Charbonnier loss or structural similarity index (SSIM), especially in regimes where fine structural recovery is necessary (Yang et al., 2023, Pan et al., 2023).
  • Hybrid output heads: Dual- (or multi-) output heads are deployed to predict both signal and noise for improved stability within the reverse chain, especially when using ODE-style inference (Zhang et al., 2023).

5. Evaluation, Empirical Results, and Applications

DiffMD have achieved strong empirical results across multiple benchmarks and tasks:

  • FluidDiff for CFD: On fluid velocity-field prediction, FluidDiff achieved MAE = 0.1975 and RMSE = 0.3137, outperforming cGAN and pure U-Net models in short-term prediction and generalization (Yang et al., 2023).
  • Quantized models: AccuQuant reduced FID2FP32 from 35.2 to 3.3 on CIFAR-10 at 6/6-bit and from 14.4 to 11.0 on text-to-image generation settings (Lee et al., 23 Oct 2025).
  • Fast high-fidelity samplers: Gradient-estimation sampler achieved FID of 3.9 on CIFAR-10 with 10 steps (vs. DDIM 16.9), and 4.3 on CelebA (vs. DDIM 18.1) (Permenter et al., 2023).
  • Robust real-world denoising: Linear-interpolation-based diffusion and SSIM/Charbonnier-trained models, even on simple CNN U-Nets, rivaled or exceeded Transformer architectures across SIDD and DND benchmarks, with PSNR/SSIM competitive with strong SOTA (Yang et al., 2023).
  • Medical and scientific domains: Self-supervised DiffMD (e.g., DDM²) for diffusion MRI restored high-frequency anatomical detail and achieved +3.2 SNR gain over Patch2Self, operating with as few as n=1–2 prior volumes (Xiang et al., 2023).
  • Accelerated computation: DiffMD sampling typically requires q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}\,x_{t-1},\,\beta_t I\right)9 network evaluations per frame (FluidDiff with αt=1βt\alpha_t = 1 - \beta_t0); while slower than one-shot models, inference is still orders of magnitude cheaper than full PDE/CDF solvers.

6. Benefits, Limitations, and Prospective Developments

Benefits:

Limitations:

  • Inference speed is limited by the need for hundreds of network passes unless replaced by distilled, ODE, or hybrid samplers (Yang et al., 2023, Zhang et al., 2023).
  • Long-horizon predictions degrade in accuracy due to the lack of physical constraints and compounding errors, motivating the integration of physics-informed projections or operators (Yang et al., 2023).
  • Direct applicability to real-world or arbitrary noise, or different data manifolds, sometimes requires adaptation of the forward process or advanced embedding, e.g., noise modeling, masking, or VAE-projected inputs (Yang et al., 2023, Li et al., 2023).

Prospective directions:

  • Physics-informed guidance and projections for constrained domains, e.g., divergence-free projection in fluid simulation (Yang et al., 2023).
  • Faster or more efficient sampling via DDIM, high-order ODE solvers, and advanced quantization (Zhang et al., 2023, Lee et al., 23 Oct 2025).
  • Application to multi-scale and spatiotemporal PDE systems, spectral operators, and larger-scale molecular or physical ensembles (Yang et al., 2023, Wu et al., 2022).
  • Unification and principled generalization to arbitrary state spaces (Markov, continuous, discrete, manifold) within the denoising diffusion/score-matching paradigm (Benton et al., 2022).

References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Denoising Diffusion Models (DiffMD).