Fundamental Denoising Relation

Updated 29 December 2025

Fundamental denoising relation is a framework that links optimal denoisers with the score function and data geometry in the presence of Gaussian noise.
It underpins methods like denoising autoencoders and score-based generative models by rigorously capturing the structural information of noisy data.
The relation extends to uncertainty quantification and phase transitions in high-dimensional problems, offering practical insights for algorithm design.

The fundamental denoising relation precisely characterizes how optimal denoisers encode structural information about noisy data, establishing rigorous links between denoising estimators, data geometry, information-theoretic quantities, and modern generative modeling. This concept encompasses exact functional identities in Gaussian noise settings, connections to the score function of the data, and the scaling laws governing mean squared error (MSE) in structured and high-dimensional problems. These results provide the mathematical backbone for a wide spectrum of methods in machine learning, signal processing, uncertainty quantification, and generative modeling.

1. Exact Relation Between Optimal Denoisers and Data Distribution

Let $x\in\mathbb{R}^d$ be a sample from an unknown density $p(x)$ , and $\tilde x = x + \sigma \epsilon$ its observation under additive Gaussian noise, with $\epsilon\sim\mathcal{N}(0,I)$ , $\sigma > 0$ . The goal is to recover $x$ from $\tilde x$ using an estimator $g:\mathbb{R}^d\to\mathbb{R}^d$ , minimizing the mean squared error

$L(g) = \mathbb{E}_{x, \tilde x}\left[\|x - g(\tilde x)\|^2\right].$

The optimal denoiser $g^*$ satisfies the exact identity (the fundamental denoising relation) (Arponen et al., 2017): $g^*(\tilde x) = \tilde x + \sigma^2 \nabla_{\tilde x} \log p(\tilde x),$ where $p(\tilde x)=\int p(\tilde x|x)p(x)dx$ is the marginal of the noisy observations. This result is valid for all $\sigma > 0$ , generalizing earlier asymptotic (small-noise) formulas.

2. Structural and Information-Theoretic Implications

This relation demonstrates that the optimal denoiser encodes the score $\nabla_{\tilde x} \log p(\tilde x)$ of the corrupted data distribution. Since this gradient field uniquely determines $p(\tilde x)$ up to a normalization, the denoiser carries complete information about the structure of the data—denoising by MSE is formally equivalent to estimating the (noisy) data manifold geometry (Arponen et al., 2017). This connection underpins several modern unsupervised learning frameworks:

Denoising autoencoders: Learn mappings closely related to the score, enabling downstream representation learning.
Score-based generative models: Employ denoising relations to recover $\nabla \log p(\tilde x)$ and sample with Langevin dynamics or diffusion processes.
Implicit density modeling: The invertibility of the relation allows, in principle, reconstruction of $p(\tilde x)$ and even (by deconvolution) $p(x)$ from $g^*$ .

3. Small-Noise Limit and the Score Function

In the limit as $\sigma\rightarrow 0$ , $p(\tilde x)$ concentrates around $p(x)$ , and the fundamental relation reduces to (Arponen et al., 2017, Manor et al., 2023)

$g^*(\tilde x) - \tilde x \approx \sigma^2 \nabla_{\tilde x} \log p(\tilde x) + O(\sigma^4).$

In this regime, the offset between the denoiser and identity gives an estimator for the (clean) score function of the data, which is central to score matching and diffusion-based generative models.

4. Moment Relations and Uncertainty Quantification

The classical (first-order) fundamental denoising relation can be extended to higher-order moments of the posterior $p(x|\tilde x)$ through an exact recursive formula (Manor et al., 2023). For the MSE-optimal denoiser $m(y) = \mathbb{E}[x|y]$ , the $k$ -th posterior central moment tensor $M^{(k)}(y)$ can be expressed in terms of derivatives of $m(y)$ :

Univariate case:
- $\mu_2(y) = \sigma^2 \mu_1'(y)$ ,
- $\mu_3(y) = \sigma^2 \mu_2'(y)$ ,
- $\mu_{k+1}(y) = \sigma^2 \mu_k'(y) + k\,\mu_{k-1}(y)\,\mu_2(y)$ (for $k\geq 3$ ).
Multivariate case:
- $[M^{(2)}(y)]_{i_1,i_2} = \sigma^2\,\frac{\partial m_{i_1}}{\partial y_{i_2}}(y)$ ,
- Recursively, $M^{(k+1)}(y) = \sigma^2 \nabla_y\!\otimes M^{(k)}(y) +$ contractions with lower moments and $M^{(2)}(y)$ .

These identities allow extraction of posterior covariance, skewness, and higher moments solely from the Jacobian and higher-order derivatives of pre-trained denoisers, enabling uncertainty quantification and principal component analysis directly from the denoising function (Manor et al., 2023).

5. MMSE Scaling Laws and Information Dimension

For broad classes of analog stationary processes corrupted by Gaussian noise, the minimum mean squared error (MMSE) in the small-noise regime is governed by the operational information dimension $d_I$ of the source (Zhou et al., 2019). For an observation $Y^n = X^n + Z^n$ ,

$\lim_{\sigma\to 0} \frac{1}{\sigma^2} \mathbb{E}\left[(X_i - \hat X_i)^2\right] = d_I,$

where $d_I$ quantifies the “continuous” degrees of freedom per sample—coinciding with the fraction of the source’s realization from a continuous component.

The Q-MAP denoiser achieves this optimal scaling in structured (including Markov and sparse) settings, highlighting only structurally relevant patterns via quantized, blockwise statistics. This generalizes earlier results on scalar Rènyi information dimension to structured processes and underpins practical, learning-based denoising for highly structured data (Zhou et al., 2019).

6. Proximal Denoising, Geometry, and Phase Transitions

In high-dimensional settings with convex structural priors $f(x)$ , the normalized mean squared error for the proximal denoising estimator

$x^* = \arg\min_x \left\{ \frac{1}{2}\|y-x\|_2^2 + \sigma \lambda f(x)\right\}$

admits an exact small- $\sigma$ limit (Oymak et al., 2013): $\lim_{\sigma\rightarrow 0} \mathrm{NMSE}(\sigma;\lambda) = \mathbb{E}[\mathrm{dist}^2(g, \lambda \partial f(x_0))],$ where $g\sim\mathcal{N}(0,I_n)$ and $\partial f(x_0)$ is the subdifferential at $x_0$ . This supplies a geometric characterization of denoising performance and, by tuning $\lambda$ to minimize this bound, the optimality of the regularized versus constrained estimators can be compared. In linear inverse problems (LASSO/generalized LASSO), these results identify sharp phase transitions—the critical sample complexity—at which recovery shifts from success to failure, governed by the statistical dimension or Gaussian mean width of the structural cone (Oymak et al., 2013).

7. Extensions, Corollaries, and Open Directions

The fundamental denoising relation and its extensions admit further generalizations:

Invertibility and density recovery: The relation between $g^*$ and $p(\tilde x)$ can be inverted along any path, enabling explicit reconstruction of $p(\tilde x)$ and, in principle, of the original uncorrupted density via deconvolution (Arponen et al., 2017).
Beyond Gaussian noise: While derivations rely on the additive Gaussian form, the basic estimator remains a conditional mean for arbitrary $p(\tilde x|x)$ . Deriving analogous closed-form relations for other corruption models (multiplicative, dropout) is an open problem (Arponen et al., 2017).
Diffusion models: In modern score-based diffusion generative setups, the denoising-score coupling appears at each infinitesimal noise increment, justifying noise-level-dependent score estimation and sampling (Arponen et al., 2017).
Learning and computation: Learning-based denoisers, including Q-MAP architectures with empirical blockwise probability tables, can practically approach the MMSE-optimal scaling, even for complex high-dimensional data, by focusing on a small, structure-relevant subset of quantized patterns (Zhou et al., 2019).

The fundamental denoising relation therefore serves as a unifying framework for understanding the behavior and optimality of denoisers under Gaussian noise, bridges denoising with deep results from information theory and high-dimensional geometry, and informs practical algorithm design across unsupervised learning, generative modeling, and uncertainty quantification.

PDF Markdown Chat (Pro)

References (4)

On the exact relationship between the denoising function and the data distribution (2017)

On the Posterior Distribution in Denoising: Application to Uncertainty Quantification (2023)

Denoising of structured random processes (2019)

Sharp MSE Bounds for Proximal Denoising (2013)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Fundamental Denoising Relation.