Information-Estimation Metric (IEM)

Updated 6 October 2025

Information-Estimation Metric is a data-driven, distribution-adaptive metric that quantifies differences between signals by integrating optimal denoising errors across noise scales.
It builds on the I–MMSE relation and Tweedie–Miyasawa formula to translate score field variations into a global metric, bridging local and global data structures.
The metric is practically computed via learned denoiser networks and Monte Carlo integration, proving effective in tasks like image quality assessment and unsupervised clustering.

The Information-Estimation Metric (IEM) is a data-driven, distribution-adaptive metric derived from a fundamental relationship between information theory and estimation theory. It quantifies the distance between signals—such as images, time series, or general vectors—by comparing their optimal denoising error vectors under Gaussian noise perturbations, integrated across a range of noise amplitudes. The resulting measure is a global metric that interpolates between classic Mahalanobis distance (for Gaussian distributions) and a learned, non-Euclidean geometry for more complex distributions, offering both local and global adaptability to the underlying probability structure of the data (Ohayon et al., 2 Oct 2025).

1. Information-Theoretic and Estimation-Theoretic Foundation

The IEM is rooted in the pointwise I–MMSE relation and the Tweedie–Miyasawa formula. The I–MMSE relationship reveals that the log-probability of a sample (surprisal) can be expressed as an integral over the Mean Square Error (MSE) of an optimal denoiser—the minimum mean squared error (MMSE)—as noise is varied. Specifically, when a signal $x$ is observed through a Gaussian channel $y_\gamma = x + \epsilon$ with $\epsilon \sim \mathcal{N}(0, \gamma)$ , the expected error made by the best estimator of $x$ from $y_\gamma$ carries information on $p(x)$ . The Tweedie–Miyasawa formula connects the optimal denoiser's error to the gradient ("score") of the log-blurred density:

$\text{MMSE error}(x, \gamma) = \mathbb{E}\left[(x - \mathbb{E}[x|y_\gamma])^2\right]$

and the score vector $\nabla \log p_{\gamma}(x)$ , where $p_{\gamma}$ is the density of $x$ blurred by noise of variance $\gamma$ .

By integrating over all such noise levels, the IEM encodes both the local curvature and global structure of the underlying data distribution.

2. Mathematical Construction

The IEM between two signals $x_1$ and $x_2$ is defined as:

$\text{IEM}(x_1, x_2; \Gamma) = \left[ \int_0^{\Gamma} \mathbb{E}_{\epsilon \sim \mathcal{N}(0, \gamma)} \|\nabla \log p_{\gamma}(x_1 + \epsilon) - \nabla \log p_{\gamma}(x_2 + \epsilon)\|^2 d\gamma \right]^{1/2}$

$p_\gamma(\cdot)$ is the density convolved with Gaussian noise of variance $\gamma$ .
The expectation is over added noise $\epsilon$ at each integration level.
The upper limit $\Gamma$ sets a maximum blur scale.

For infinitesimal differences ( $x_2 = x_1 + \delta$ ), the IEM’s local expansion yields a Riemannian metric:

$\text{IEM}^2(x, x+\delta) = \delta^\top G(x, \Gamma) \delta + o(\|\delta\|^2), \quad\text{where}$

$G(x, \Gamma) = \int_0^{\Gamma} \gamma^2 \mathbb{E}_{\epsilon}\left[ (\nabla^2 \log p_\gamma(x + \epsilon))^2 \right] d\gamma$

or equivalently,

$G(x, \Gamma) = \int_0^{\Gamma} \mathbb{E}_{\epsilon}\left[ -\gamma \, \text{Cov}(x|x+\epsilon)^2 \right] d\gamma$

where $G(x, \Gamma)$ is the local metric tensor averaged over noise levels.

For Gaussian distributions, the score field is linear and the IEM reduces globally and locally to the Mahalanobis distance.

3. Geometric and Statistical Interpretation

The IEM generalizes classical Euclidean and Mahalanobis distances by intrinsically adapting to the data distribution. It operates via the geometry of score fields—gradients of the log-probability evaluated under increasing blur. This means that:

For Gaussian data, the metric is linear, and IEM matches Mahalanobis.
For multi-modal, heavy-tailed, or non-Gaussian data, the score field's inhomogeneity results in a metric that varies both locally (local curvature) and globally (adapting to modes and low-probability regions).

This adaptation arises because the metric accumulates squared differences in the vector field directions as noise is dialed in, providing sensitivity to distributional geometry that standard metrics cannot match.

4. Practical Computation via Denoisers

Since $p(x)$ and its score are typically unknown in practical applications, the IEM is estimated via a learned denoiser network analogous to those used in score-based generative diffusion models:

A denoiser $D_\tau(\cdot, \gamma)$ is trained to recover $x$ from noisy samples $x+\epsilon$ , for various $\gamma$ .
The Tweedie–Miyasawa formula relates the residual $x+\epsilon - D_\tau(x+\epsilon, \gamma)$ directly to the score of the blurred density.
For each noise level $\gamma$ , this enables numerical estimation of the score difference between $x_1$ and $x_2$ via denoiser outputs.
The IEM is then computed as a Monte Carlo estimate of the one-dimensional (over $\gamma$ ) integral of squared score differences.
Discretization (e.g., via Euler–Maruyama) is used for the integral, and Brownian motion samples provide the noise trajectory.

For Gaussian data, a linear denoiser suffices; for more complex data, a neural network denoiser must be trained over noise levels.

5. Applications and Empirical Evaluation

The IEM has been evaluated for image similarity, perceptual quality, and representation learning:

A diffusion-style denoiser (Hourglass Diffusion Transformer architecture) was trained on ImageNet, with the underlying IEM learned unsupervised (without access to human perceptual data or labels).
On datasets such as TID2013, LIVE, CSIQ, and TQD, the learned IEM matched or exceeded the performance of state-of-the-art supervised metrics in predicting human mean opinion scores and forced-choice judgments.
The metric naturally adapts to both global and local distortions, offering improved sensitivity to subtle differences closely aligned with human perception.

Beyond image quality assessment, the IEM framework enables unsupervised clustering, data retrieval, representation learning, and potentially objective tuning for image restoration or compression, wherever a data-adaptive metric is needed.

6. Theoretical Connections and Generalizations

For $\Gamma \to \infty$ , the average local metric satisfies:

$\mathbb{E}[ -\nabla^2 \log p(x) ] = \mathbb{E}[ \nabla \log p(x) \nabla \log p(x)^\top ]$

This links the IEM to classical Fisher information identities. For Gaussian signals, the IEM and Mahalanobis coincide globally and locally. For other distributions, the IEM defines a distribution-adaptive Riemannian geometry that incorporates higher-order structure. The approach generalizes to any continuous domain and can be transferred to domains such as speech, time series, or medical imaging, provided a suitable denoiser is available and a sufficiently rich family of noise levels is sampled.

7. Summary Table: Theoretical and Practical Properties

Property	Gaussian Case	General Distribution
Local Metric	Mahalanobis	Data-adaptive
Global Metric	Mahalanobis	Integrated over score field
Computational Req.	Closed-form	Requires denoiser and MC
Geometric Nature	Linear/Elliptic	Nonlinear/Riemannian

The IEM’s formulation unifies information, estimation, and geometry, permitting direct learning of data-dependent metrics that are optimal for the underlying application domain and outperform supervised or hand-crafted metrics on complex natural data (Ohayon et al., 2 Oct 2025).

PDF Markdown Chat (Pro)

References (1)

Learning a distance measure from the information-estimation geometry of data (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Information-Estimation Metric (IEM).