Diffusion and Reverse Process

Updated 19 December 2025

Diffusion and Reverse Process are stochastic techniques where the forward process progressively transforms data into a simple Gaussian distribution while the reverse process learns to reconstruct the original data.
The reverse process uses neural score estimation or regression methods, such as in DDPMs and SDEs, to accurately denoise and recover complex data distributions.
Variants like single-step latent diffusion and accelerated sampling schemes address computational efficiency, support containment, and robust convergence in practical applications.

A diffusion model consists of two fundamental stochastic processes defined over a high-dimensional space: the forward diffusion (noising) process, which progressively transforms a data distribution into a tractable reference distribution (typically Gaussian), and the reverse process, which is learned and deployed to reconstruct data by inverting this trajectory, either for generative modeling, inference, or related estimation tasks. Both processes possess rich analytical structure and underlie a range of state-of-the-art methods in computer vision, audio, probabilistic inference, and scientific computing.

1. Definition and Mathematical Structure of the Forward Diffusion Process

The forward process is a fixed, typically linear, Markov process that injects noise into input data $x_0$ over $T$ steps. In standard Denoising Diffusion Probabilistic Models (DDPMs), the process is parameterized by a sequence of variances $\beta_t$ and defined recursively: $q(x_{1:T}\mid x_0) = \prod_{t=1}^T q(x_t \mid x_{t-1}),$

$q(x_t \mid x_{t-1}) = \mathcal{N}(x_t; \sqrt{\alpha_t} x_{t-1}, \beta_t \mathbf{I}),$

with $\alpha_t = 1-\beta_t$ , yielding a closed-form marginal: $q(x_t \mid x_0) = \mathcal{N}(x_t; \sqrt{\bar{\alpha}_t} x_0, (1 - \bar{\alpha}_t) \mathbf{I}),$ where $\bar{\alpha}_t = \prod_{i=1}^t \alpha_i$ . As $t \to T$ with sufficiently small $\beta_t$ , $q(x_T)$ approaches an isotropic Gaussian, regardless of $x_0$ (Strümke et al., 2023, Cao et al., 28 Jan 2025, Le, 14 Dec 2024).

Specialized variants carry this structure to non-Euclidean spaces (e.g., SE(3) for rigid transformations in MonoSE(3)-Diffusion (Zhu et al., 12 Oct 2025)), constrained spaces like the probability simplex (Floto et al., 2023), or latent spaces compressed by pretrained autoencoders (Lin et al., 26 Jun 2024).

The forward chain is typically non-learned, with all parameters fixed prior to training, guaranteeing analytical tractability and stable convergence to the reference distribution.

2. Construction and Learning of the Reverse Process

The reverse process seeks to invert the noising trajectory, reconstructing samples from the simple reference distribution toward the original complex data distribution. For the continuous-time limit, the reverse process is a time-inhomogeneous stochastic differential equation (SDE): $dx = [f(x, t) - g(t)^2 \nabla_x \log p_t(x)] dt + g(t) d\bar{W}_t,$ with $p_t(x)$ the marginal at time $t$ and $s_\theta(x, t) \approx \nabla_x \log p_t(x)$ a learned neural approximation of the score function (Dasgupta et al., 10 Apr 2025). In DDPMs and their variants, the reverse chain is parameterized as a Markov process: $p_\theta(x_{t-1} \mid x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \sigma_t^2 \mathbf{I}),$ where the mean is constructed to optimally "denoise" $x_t$ —often by reparameterizing $\mu_\theta(x_t, t)$ to directly predict the added noise $\epsilon$ or, in regression-targeted setups, the underlying clean data (Le, 14 Dec 2024, Bai et al., 2023, Trachu et al., 10 Jun 2024). The training objective is an evidence lower bound (ELBO) or its practical surrogate, such as: $\mathcal{L}_{\text{simple}} = \mathbb{E}_{t, x_0, \epsilon}\| \epsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, t) \|^2$ or, for regression-diffusion methods, mean-square error against the clean target (Trachu et al., 10 Jun 2024).

In the PDE perspective, the reverse process can be derived as an exact time-reversal of the Fokker–Planck equation, yielding a backward PDE in the form: $\frac{\partial p(x, \tau)}{\partial \tau} = -\nabla \cdot [v(x, t) p(x, \tau)] + \frac{\alpha g(t)}{2}\Delta_x p(x, \tau),$ where $v$ involves drift, score, and (optionally) the stochasticity parameter $\alpha$ (Dasgupta et al., 10 Apr 2025, Cao et al., 28 Jan 2025).

For certain class-conditional or inverse problems, the reverse process generalizes to conditional distributions, where the score network is trained to model $\nabla_x \log p_t(x \mid y)$ (Dasgupta et al., 10 Apr 2025).

3. Algorithmic and Modeling Variants

a. Single-step Reverse Processes and Latent Diffusion

Recent advances demonstrate that it is possible to bypass the extensive multi-step reverse process typical in DDPMs. SDSeg (Lin et al., 26 Jun 2024) utilizes a “single-step latent estimation” strategy in compressed latent space rather than pixel space. By inverting the closed-form marginal,

$z_t = \sqrt{\bar{\alpha}_t} z_0 + \sqrt{1 - \bar{\alpha}_t} n,$

and training a network $f_\theta(z_t, z_c)$ to estimate the noise $n$ , the clean latent $z_0$ is estimated in one step: $\tilde{z}_0 = \frac{z_t - \sqrt{1 - \bar{\alpha}_t} \tilde{n}}{\sqrt{\bar{\alpha}_t}}.$ Latent fusion by channel concatenation enables conditioning on side information (e.g., a medical image), replacing cross-attention and removing the need for multiple-sample averaging.

Similarly, Thunder (Trachu et al., 10 Jun 2024) casts the forward process as a Brownian bridge, parameterizes the reverse process via direct regression to $x_0$ , and enables either pure regression or diffusion-based denoising depending on the reverse-time discretization, achieving competitive results with a single reverse step.

b. Flexible and Accelerated Sampling

Several recent works address the computational demands of the reverse process:

Truncated Diffusion/ES-DDPM (Zheng et al., 2022, Lyu et al., 2022): Stop the forward process at intermediate step $S \ll T$ , utilize a learned prior (GAN, VAE) at $x_S$ , and perform only $S$ reverse steps, attaining sample quality comparable to or better than full-chain DDPMs at far lower computational cost.
Reverse Diffusion Sequential Monte Carlo (Wu et al., 8 Aug 2025): Treat the reverse process as a proposal in an SMC sampler, introducing correction mechanisms for both time discretization and score approximation, and yielding consistent sampling/normalization estimation.
RTK-MALA/ULD (Huang et al., 26 May 2024): Partition the reverse process into a small number of strongly log-concave subproblems, each solved rapidly using MALA or underdamped Langevin samplers, achieving provably improved convergence rates over standard diffusion inference.

c. Non-standard Domains and Constraints

Forward and reverse processes can be constructed for constrained or structured domains:

Pose Estimation on SE(3) (Zhu et al., 12 Oct 2025): The forward process is built in a monocular-normalized SE(3) representation, enforcing visibility constraints to ensure transformations stay within the camera frustum. The reverse process iteratively refines pose through a denoising network, with explicit awareness of timestep-dependent noise scales.
Diffusion on the Probability Simplex (Floto et al., 2023): The forward SDE is constructed as an OU process on latent logits, mapped to the simplex via softmax; the reverse process is performed in latent space but results in categorical (or bounded) output.

d. Neural SDEs vs. ODEs in Reverse

It is established that including noise in the reverse process (i.e., using an SDE rather than deterministic ODE) has regularizing advantages, enabling $L^2$ -norm trajectory approximation for arbitrary continuous reference flows, even when the target score is not Lipschitz—whereas deterministic flows require restrictive smoothness assumptions and only guarantee weak (Wasserstein) approximation (Elamvazhuthi et al., 2023). As a result, reverse SDEs can represent and track distributions more robustly and are recommended for modeling non-smooth or multimodal data.

4. Theoretical Properties and Boundary Effects

a. Support Containment

A key analytical result is that the support of the final distribution produced by the reverse process is contained within the support of the original training data. That is, the reverse-time dynamics,

$\mathrm{supp}\,q(\cdot, T_*) \subset \mathrm{supp}\,\rho_0,$

meaning the analytic reverse diffusion can never generate samples outside the manifold of the training set (Cao et al., 28 Jan 2025). If the data distribution is discrete, the dynamics converge exactly to the observed points, preventing generalization in the analytic limit—so practical generalization must arise from score network approximation errors rather than the theoretical machinery.

b. Control-theoretic and PDE Interpretations

The forward and reverse processes can be cast as infinite-dimensional control systems. The SDE or PDE structure provides a means to "steer" the data density via the neural score function. In neural SDEs, the network weights act as closed-loop controls allowing universal $L^2$ reachability of target densities (Elamvazhuthi et al., 2023). PDE-based derivations unify variance-exploding (VE) and variance-preserving (VP) models, and provide explicit formulae for the sampling SDEs/ODEs driving the process (Dasgupta et al., 10 Apr 2025).

5. Applications and Domain-specific Instantiations

a. Biomedical Segmentation and Enhancement

SDSeg (Lin et al., 26 Jun 2024) demonstrates that by operating in the latent space, the diffusion and reverse processes can be utilized for segmentation tasks, achieving state-of-the-art Dice/IoU metrics in biomedical image analysis with only one reverse-network call and a simple concatenation-based latent fusion.

b. Speech Enhancement

Thunder (Trachu et al., 10 Jun 2024) applies a Brownian bridge diffusion, merging regression and diffusion into a single model, and achieves nearly full denoising quality with just one or a few reverse steps, while also improving out-of-domain robustness.

c. Graph Neural Networks

The message-passing structure of GNNs is mathematically equivalent to a forward diffusion process over node representations, leading to over-smoothing. Explicit inclusion of discrete or continuous-time reverse processes substantially improves heterophilic node classification accuracy and supports ultra-deep GNN stacks (Park et al., 11 Mar 2024).

d. Inverse Problems and Scientific ML

Diffusion and reverse process frameworks are extended to solve physics-based inverse problems, where conditional score models are trained to reconstruct posterior distributions for complex measurement operators (Dasgupta et al., 10 Apr 2025).

e. Quantum–Classical Connection

The semiclassical limit of quantum Lindblad dynamics connects the Petz map (quantum Bayes rule) to classical reverse diffusion, showing that the mathematical structure underlying generative diffusion models recovers classical probabilistic reversibility in the appropriate limit (Nasu et al., 21 Oct 2025).

6. Practical Training, Sampling, and Acceleration

Noise schedule: The choice of $\beta_t$ (linear, cosine, or custom) determines signal preservation and effective denoising (Le, 14 Dec 2024, Strümke et al., 2023).
Sampling algorithms: Euler–Maruyama discretization is predominant; higher-order samplers and predictor-corrector schemes are employed for accuracy–speed trade-offs (Dasgupta et al., 10 Apr 2025).
Parameter efficiency: Switching from multi-step to single-step or few-step reverse inference as in SDSeg, Thunder, and truncated diffusion models enables large reductions in computation and memory with comparable or even improved quantitative metrics (Lin et al., 26 Jun 2024, Zheng et al., 2022, Trachu et al., 10 Jun 2024).
Manifold learning: The reverse process, by following the learned score field, contracts noise-space samples back to the data manifold, implicitly modeling high-probability regions (Strümke et al., 2023).

7. Limitations, Generalization, and Open Problems

Support limitations: The analytic reverse process cannot generate outside the training support—practical generalization originates from imperfect score estimation and architectural choices (Cao et al., 28 Jan 2025).
Approximation quality: Stochastic reverse processes empirically outperform deterministic counterparts due to regularizing effects, but their performance depends on the capacity of the score network to approximate difficult vector fields (Elamvazhuthi et al., 2023).
Acceleration trade-offs: Truncated, early-stopped, and SMC-based samplers achieve large computational savings but can rely on auxiliary generative models or incur non-negligible bias if the assumed prior diverges from the true data marginal at the truncation point (Zheng et al., 2022, Wu et al., 8 Aug 2025).
Algorithm choice: The optimal partitioning of the reverse process (number of steps, log-concavity segmentation) remains domain- and data-dependent, with practical selection guided by a combination of theoretical guarantees and empirical validation (Huang et al., 26 May 2024).