Papers
Topics
Authors
Recent
2000 character limit reached

Fundamental Denoising Relation

Updated 29 December 2025
  • Fundamental denoising relation is a framework that links optimal denoisers with the score function and data geometry in the presence of Gaussian noise.
  • It underpins methods like denoising autoencoders and score-based generative models by rigorously capturing the structural information of noisy data.
  • The relation extends to uncertainty quantification and phase transitions in high-dimensional problems, offering practical insights for algorithm design.

The fundamental denoising relation precisely characterizes how optimal denoisers encode structural information about noisy data, establishing rigorous links between denoising estimators, data geometry, information-theoretic quantities, and modern generative modeling. This concept encompasses exact functional identities in Gaussian noise settings, connections to the score function of the data, and the scaling laws governing mean squared error (MSE) in structured and high-dimensional problems. These results provide the mathematical backbone for a wide spectrum of methods in machine learning, signal processing, uncertainty quantification, and generative modeling.

1. Exact Relation Between Optimal Denoisers and Data Distribution

Let xRdx\in\mathbb{R}^d be a sample from an unknown density p(x)p(x), and x~=x+σϵ\tilde x = x + \sigma \epsilon its observation under additive Gaussian noise, with ϵN(0,I)\epsilon\sim\mathcal{N}(0,I), σ>0\sigma > 0. The goal is to recover xx from x~\tilde x using an estimator g:RdRdg:\mathbb{R}^d\to\mathbb{R}^d, minimizing the mean squared error

L(g)=Ex,x~[xg(x~)2].L(g) = \mathbb{E}_{x, \tilde x}\left[\|x - g(\tilde x)\|^2\right].

The optimal denoiser gg^* satisfies the exact identity (the fundamental denoising relation) (Arponen et al., 2017): g(x~)=x~+σ2x~logp(x~),g^*(\tilde x) = \tilde x + \sigma^2 \nabla_{\tilde x} \log p(\tilde x), where p(x~)=p(x~x)p(x)dxp(\tilde x)=\int p(\tilde x|x)p(x)dx is the marginal of the noisy observations. This result is valid for all σ>0\sigma > 0, generalizing earlier asymptotic (small-noise) formulas.

2. Structural and Information-Theoretic Implications

This relation demonstrates that the optimal denoiser encodes the score x~logp(x~)\nabla_{\tilde x} \log p(\tilde x) of the corrupted data distribution. Since this gradient field uniquely determines p(x~)p(\tilde x) up to a normalization, the denoiser carries complete information about the structure of the data—denoising by MSE is formally equivalent to estimating the (noisy) data manifold geometry (Arponen et al., 2017). This connection underpins several modern unsupervised learning frameworks:

  • Denoising autoencoders: Learn mappings closely related to the score, enabling downstream representation learning.
  • Score-based generative models: Employ denoising relations to recover logp(x~)\nabla \log p(\tilde x) and sample with Langevin dynamics or diffusion processes.
  • Implicit density modeling: The invertibility of the relation allows, in principle, reconstruction of p(x~)p(\tilde x) and even (by deconvolution) p(x)p(x) from gg^*.

3. Small-Noise Limit and the Score Function

In the limit as σ0\sigma\rightarrow 0, p(x~)p(\tilde x) concentrates around p(x)p(x), and the fundamental relation reduces to (Arponen et al., 2017, Manor et al., 2023)

g(x~)x~σ2x~logp(x~)+O(σ4).g^*(\tilde x) - \tilde x \approx \sigma^2 \nabla_{\tilde x} \log p(\tilde x) + O(\sigma^4).

In this regime, the offset between the denoiser and identity gives an estimator for the (clean) score function of the data, which is central to score matching and diffusion-based generative models.

4. Moment Relations and Uncertainty Quantification

The classical (first-order) fundamental denoising relation can be extended to higher-order moments of the posterior p(xx~)p(x|\tilde x) through an exact recursive formula (Manor et al., 2023). For the MSE-optimal denoiser m(y)=E[xy]m(y) = \mathbb{E}[x|y], the kk-th posterior central moment tensor M(k)(y)M^{(k)}(y) can be expressed in terms of derivatives of m(y)m(y):

  • Univariate case:
    • μ2(y)=σ2μ1(y)\mu_2(y) = \sigma^2 \mu_1'(y),
    • μ3(y)=σ2μ2(y)\mu_3(y) = \sigma^2 \mu_2'(y),
    • μk+1(y)=σ2μk(y)+kμk1(y)μ2(y)\mu_{k+1}(y) = \sigma^2 \mu_k'(y) + k\,\mu_{k-1}(y)\,\mu_2(y) (for k3k\geq 3).
  • Multivariate case:
    • [M(2)(y)]i1,i2=σ2mi1yi2(y)[M^{(2)}(y)]_{i_1,i_2} = \sigma^2\,\frac{\partial m_{i_1}}{\partial y_{i_2}}(y),
    • Recursively, M(k+1)(y)=σ2y ⁣M(k)(y)+M^{(k+1)}(y) = \sigma^2 \nabla_y\!\otimes M^{(k)}(y) + contractions with lower moments and M(2)(y)M^{(2)}(y).

These identities allow extraction of posterior covariance, skewness, and higher moments solely from the Jacobian and higher-order derivatives of pre-trained denoisers, enabling uncertainty quantification and principal component analysis directly from the denoising function (Manor et al., 2023).

5. MMSE Scaling Laws and Information Dimension

For broad classes of analog stationary processes corrupted by Gaussian noise, the minimum mean squared error (MMSE) in the small-noise regime is governed by the operational information dimension dId_I of the source (Zhou et al., 2019). For an observation Yn=Xn+ZnY^n = X^n + Z^n,

limσ01σ2E[(XiX^i)2]=dI,\lim_{\sigma\to 0} \frac{1}{\sigma^2} \mathbb{E}\left[(X_i - \hat X_i)^2\right] = d_I,

where dId_I quantifies the “continuous” degrees of freedom per sample—coinciding with the fraction of the source’s realization from a continuous component.

The Q-MAP denoiser achieves this optimal scaling in structured (including Markov and sparse) settings, highlighting only structurally relevant patterns via quantized, blockwise statistics. This generalizes earlier results on scalar Rènyi information dimension to structured processes and underpins practical, learning-based denoising for highly structured data (Zhou et al., 2019).

6. Proximal Denoising, Geometry, and Phase Transitions

In high-dimensional settings with convex structural priors f(x)f(x), the normalized mean squared error for the proximal denoising estimator

x=argminx{12yx22+σλf(x)}x^* = \arg\min_x \left\{ \frac{1}{2}\|y-x\|_2^2 + \sigma \lambda f(x)\right\}

admits an exact small-σ\sigma limit (Oymak et al., 2013): limσ0NMSE(σ;λ)=E[dist2(g,λf(x0))],\lim_{\sigma\rightarrow 0} \mathrm{NMSE}(\sigma;\lambda) = \mathbb{E}[\mathrm{dist}^2(g, \lambda \partial f(x_0))], where gN(0,In)g\sim\mathcal{N}(0,I_n) and f(x0)\partial f(x_0) is the subdifferential at x0x_0. This supplies a geometric characterization of denoising performance and, by tuning λ\lambda to minimize this bound, the optimality of the regularized versus constrained estimators can be compared. In linear inverse problems (LASSO/generalized LASSO), these results identify sharp phase transitions—the critical sample complexity—at which recovery shifts from success to failure, governed by the statistical dimension or Gaussian mean width of the structural cone (Oymak et al., 2013).

7. Extensions, Corollaries, and Open Directions

The fundamental denoising relation and its extensions admit further generalizations:

  • Invertibility and density recovery: The relation between gg^* and p(x~)p(\tilde x) can be inverted along any path, enabling explicit reconstruction of p(x~)p(\tilde x) and, in principle, of the original uncorrupted density via deconvolution (Arponen et al., 2017).
  • Beyond Gaussian noise: While derivations rely on the additive Gaussian form, the basic estimator remains a conditional mean for arbitrary p(x~x)p(\tilde x|x). Deriving analogous closed-form relations for other corruption models (multiplicative, dropout) is an open problem (Arponen et al., 2017).
  • Diffusion models: In modern score-based diffusion generative setups, the denoising-score coupling appears at each infinitesimal noise increment, justifying noise-level-dependent score estimation and sampling (Arponen et al., 2017).
  • Learning and computation: Learning-based denoisers, including Q-MAP architectures with empirical blockwise probability tables, can practically approach the MMSE-optimal scaling, even for complex high-dimensional data, by focusing on a small, structure-relevant subset of quantized patterns (Zhou et al., 2019).

The fundamental denoising relation therefore serves as a unifying framework for understanding the behavior and optimality of denoisers under Gaussian noise, bridges denoising with deep results from information theory and high-dimensional geometry, and informs practical algorithm design across unsupervised learning, generative modeling, and uncertainty quantification.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Fundamental Denoising Relation.