Unified Denoising Perspective

Updated 13 August 2025

Unified Denoising Perspective is a framework that unifies Markovian generative models by framing forward noising and backward denoising within an operator-theoretic and variational formulation.
It employs a variational loss based on measure transport to control the discrepancy between the true data distribution and the model-induced distribution.
The approach generalizes classical score matching by incorporating continuous diffusions, discrete jump processes, and Lévy dynamics under a single formalism.

A unified denoising perspective encompasses a broad family of generative models—most notably those based on Markovian stochastic dynamics such as diffusion, flow-based, and jump process models—by systematically framing both the forward (noising) and backward (denoising) processes within a mathematically rigorous operator-theoretic and variational context. This unification enables explicit construction and analysis of model classes capable of efficiently transporting probability measures from complex data distributions to simple reference distributions and vice versa, while generalizing both continuous and discrete denoising models under a single formalism (Ren et al., 2 Apr 2025).

1. Markovian and Generator-Theoretic Foundations

The cornerstone of this unified approach is the theory of Feller evolution systems and their associated semigroups and generators. Given a Markov process $(x_t)$ on a locally compact space $E$ with base measure $\mu$ , the evolution operators $(U_{t,s})$ are defined by

$U_{t,s} f(x) = \mathbb{E}[f(x_t) \mid x_s = x]$

with strong continuity, contractivity, and positivity-preservation on $C_0(E)$ , the space of continuous functions vanishing at infinity.

For time-homogeneous models, a Feller semigroup $(T_t)_{t\geq0}$ with infinitesimal generator

$A f(x) = \lim_{h\to0} \frac{T_h f(x) - f(x)}{h}$

determines the evolution of observables, while its adjoint operator $A^*$ governs the Fokker–Planck (Kolmogorov forward) equation for probability densities: $\partial_t p_t = A^* p_t$ Time-inhomogeneity is accommodated by augmenting the process to the space $[0,T] \times E$ and introducing the right generator

$\mathcal{A} f(s,x) = \partial_s f(s,x) + A_s f(s,x)$

This construction allows analysis of both forward and backward dynamics through generator calculus, enabling explicit operator-based control of denoising processes.

2. Unified Forward/Backward Process Construction

Within this formalism, the forward process is a Markov process that progressively transforms (by noising) a sample from the data distribution $p_0$ so that, at a terminal time $T$ , the distribution $p_T$ becomes analytically convenient (e.g., Gaussian, uniform). An example for continuous diffusions is

$dx_t = b_t(x_t) dt + \sigma_t(x_t) dW_t$

where the generator is

$A_t f(x) = b_t(x) \cdot \nabla f(x) + \frac{1}{2}\sigma_t(x)\sigma_t(x)^T : \nabla^2 f(x)$

The backward (denoising) process is the time-reversal of the forward process. The key result is that for density $p_t$ ,

$\mathcal{C}_t f = p_t^{-1} A_t^* (p_t f) - p_t^{-1} f A_t^* p_t = A_t^* f + p_t^{-1} \Gamma_t^*(p_t, f)$

with $\Gamma_t^*$ the carré du champ operator associated to $A_t^*$ . This explicit construction enables generating data by simulating the backward process from $p_T$ , “denoising” back to $p_0$ .

3. Variational Formulation for Model Design

The framework introduces a unifying variational objective for denoising models based on a measure transport (change-of-measure) argument. The Kullback–Leibler divergence between the true data distribution and the model (generator-induced) distribution is controlled by an integrated functional: $\mathcal{L}[\theta] = \mathbb{E}\left[\int_0^T \left(\text{Term}_{\text{diffusion}} + \text{Term}_{\text{jump}}\right) dt \right]$ For the diffusion case, this has the form: $\mathbb{E}\left[\int_0^T \frac{1}{2} D_t(x) : \left( \nabla \log \varphi_t(x) - \nabla \log p_t(x) \right) \left( \nabla \log \varphi_t(x) - \nabla \log p_t(x) \right)^T dt \right]$ with $\varphi_t$ (often parametrized by a neural network) approximating the (unknown) marginal $p_t$ . The variational loss thus directly measures the discrepancy in measure transport induced by the estimated backward generator.

4. Unifying Classical and Generalized Score-Matching

Classical score matching (Hyvarinen 2005) seeks to learn the gradient of log-density (“score”) by minimizing

$\mathbb{E} \| s_t^{(\theta)}(x) - \nabla \log p_t(x) \|^2$

The unified denoising Markov model framework generalizes this notion to a broad class of Markov processes. For continuous diffusions,

$\mathcal{L}_{\text{SM}}[\theta] = \mathbb{E}_{x_0 \sim p_0} \left[ \int_0^T \mathbb{E}_{x_t \sim p_{t|0}} \left[ \frac{1}{2} D_t(x_t) : \| s_t^{(\theta)}(x_t) - \nabla \log p_{t|0}(x_t|x_0) \|^2 \right] dt \right]$

For jump processes, the loss targets discrete score functions (e.g., density ratios) corresponding to transitions in the jump kernel. These constructions ensure a unified adaptation of the score-matching principle across continuous diffusions, discrete-jump (Markov chain) models, and more general Lévy–Itô processes.

5. Generality via Lévy–Khintchine Representation

Under regularity assumptions (notably, Feller-type and Courrège’s theorem), the class of admissible forward processes encompasses arbitrary Lévy-type processes, with generator

$A_t f(x) = b_t(x) \cdot \nabla f(x) + \frac{1}{2} D_t(x) : \nabla^2 f(x) + \int_{\mathbb{R}^d \setminus \{0\}} \left( f(x+z) - f(x) - z \cdot \nabla f(x) \chi(z) \right) \lambda_t(x, dz)$

where $b_t(x)$ is drift, $D_t(x)$ is (possibly state-dependent) diffusion, and $\lambda_t(x,dz)$ is a Lévy measure. This covers both classical SDEs, geometric Brownian motion, and pure-jump (e.g., Poisson or compound Poisson) processes, subsuming discrete and continuous denoising models alike.

6. Practical Instantiations and Empirical Demonstration

The paper illustrates the practical flexibility of the framework with concrete examples:

Geometric Brownian motion: Forwards SDE $dx_t = x_t \odot (\mu dt + \sigma dW_t)$ , denoising via the time-reversed SDE. This demonstrates the method’s applicability to processes with non-constant (state-dependent) diffusion.
Jump processes: Markov chain with discrete states and generator

$A_t f(x) = \sum_{y \neq x} ( f(y) - f(x) ) \lambda_t(y, x)$

Backward process jump rates are reweighted by the density ratio $p_t(y)/p_t(x)$ . This formulation includes and generalizes discrete diffusion models on finite spaces. Empirical results highlight the recovery of complex geometric distributions (e.g. Swiss roll, moons).

7. Statistical Physics Connections: Doob’s h-Transform

A central theoretical underpinning ties denoising Markov models to nonequilibrium statistical mechanics via the generalized Doob’s $h$ -transform. The backward generator is represented as

$A_{\text{est}} f = A^* f + \varphi_t^{-1} \Gamma^*(\varphi_t, f)$

Minimizing the variational loss effectively reweights the forward process path measure—a perspective aligning with entropy production and fluctuation-dissipation theorems. This connection provides both an interpretation of denoising in terms of optimal stochastic control and a principled means to design denoising generators using a variational approach.

Summary Table: Key Components

Component	Mathematical Tool	Unified Role
Forward Process	Feller semigroup/generator	Progressive noising
Backward Process	Time-reversed generator	Denoising (sample synthesis)
Variational Loss	KL/Measure discrepancy, Doob’s h	Model training/selection
Score Matching	Extended Hyvarinen objective	Score estimation, learning
General Generator Class	Lévy–Khintchine representation	Encompasses SDEs, jumps, Lévy

Conclusion

The unified denoising perspective presented in this framework integrates the analysis and design of denoising Markov models—bringing together continuous diffusions, discrete jump processes, and more general Lévy dynamics—through rigorous generator theory, explicit time reversal, and variational formulation. By connecting measure transport, score-matching, and nonequilibrium statistical mechanics via the Doob h-transform, it provides both a general recipe for constructing denoising models and a foundation for further advancing generative modeling across complex, high-dimensional distributions (Ren et al., 2 Apr 2025).

PDF Markdown Chat (Upgrade)

References (1)

1.

A Unified Approach to Analysis and Design of Denoising Markov Models (2025)