Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 58 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 115 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Information-Estimation Metric (IEM)

Updated 6 October 2025
  • Information-Estimation Metric is a data-driven, distribution-adaptive metric that quantifies differences between signals by integrating optimal denoising errors across noise scales.
  • It builds on the I–MMSE relation and Tweedie–Miyasawa formula to translate score field variations into a global metric, bridging local and global data structures.
  • The metric is practically computed via learned denoiser networks and Monte Carlo integration, proving effective in tasks like image quality assessment and unsupervised clustering.

The Information-Estimation Metric (IEM) is a data-driven, distribution-adaptive metric derived from a fundamental relationship between information theory and estimation theory. It quantifies the distance between signals—such as images, time series, or general vectors—by comparing their optimal denoising error vectors under Gaussian noise perturbations, integrated across a range of noise amplitudes. The resulting measure is a global metric that interpolates between classic Mahalanobis distance (for Gaussian distributions) and a learned, non-Euclidean geometry for more complex distributions, offering both local and global adaptability to the underlying probability structure of the data (Ohayon et al., 2 Oct 2025).

1. Information-Theoretic and Estimation-Theoretic Foundation

The IEM is rooted in the pointwise I–MMSE relation and the Tweedie–Miyasawa formula. The I–MMSE relationship reveals that the log-probability of a sample (surprisal) can be expressed as an integral over the Mean Square Error (MSE) of an optimal denoiser—the minimum mean squared error (MMSE)—as noise is varied. Specifically, when a signal xx is observed through a Gaussian channel yγ=x+ϵy_\gamma = x + \epsilon with ϵN(0,γ)\epsilon \sim \mathcal{N}(0, \gamma), the expected error made by the best estimator of xx from yγy_\gamma carries information on p(x)p(x). The Tweedie–Miyasawa formula connects the optimal denoiser's error to the gradient ("score") of the log-blurred density:

MMSE error(x,γ)=E[(xE[xyγ])2]\text{MMSE error}(x, \gamma) = \mathbb{E}\left[(x - \mathbb{E}[x|y_\gamma])^2\right]

and the score vector logpγ(x)\nabla \log p_{\gamma}(x), where pγp_{\gamma} is the density of xx blurred by noise of variance γ\gamma.

By integrating over all such noise levels, the IEM encodes both the local curvature and global structure of the underlying data distribution.

2. Mathematical Construction

The IEM between two signals x1x_1 and x2x_2 is defined as:

IEM(x1,x2;Γ)=[0ΓEϵN(0,γ)logpγ(x1+ϵ)logpγ(x2+ϵ)2dγ]1/2\text{IEM}(x_1, x_2; \Gamma) = \left[ \int_0^{\Gamma} \mathbb{E}_{\epsilon \sim \mathcal{N}(0, \gamma)} \|\nabla \log p_{\gamma}(x_1 + \epsilon) - \nabla \log p_{\gamma}(x_2 + \epsilon)\|^2 d\gamma \right]^{1/2}

  • pγ()p_\gamma(\cdot) is the density convolved with Gaussian noise of variance γ\gamma.
  • The expectation is over added noise ϵ\epsilon at each integration level.
  • The upper limit Γ\Gamma sets a maximum blur scale.

For infinitesimal differences (x2=x1+δx_2 = x_1 + \delta), the IEM’s local expansion yields a Riemannian metric:

IEM2(x,x+δ)=δG(x,Γ)δ+o(δ2),where\text{IEM}^2(x, x+\delta) = \delta^\top G(x, \Gamma) \delta + o(\|\delta\|^2), \quad\text{where}

G(x,Γ)=0Γγ2Eϵ[(2logpγ(x+ϵ))2]dγG(x, \Gamma) = \int_0^{\Gamma} \gamma^2 \mathbb{E}_{\epsilon}\left[ (\nabla^2 \log p_\gamma(x + \epsilon))^2 \right] d\gamma

or equivalently,

G(x,Γ)=0ΓEϵ[γCov(xx+ϵ)2]dγG(x, \Gamma) = \int_0^{\Gamma} \mathbb{E}_{\epsilon}\left[ -\gamma \, \text{Cov}(x|x+\epsilon)^2 \right] d\gamma

where G(x,Γ)G(x, \Gamma) is the local metric tensor averaged over noise levels.

For Gaussian distributions, the score field is linear and the IEM reduces globally and locally to the Mahalanobis distance.

3. Geometric and Statistical Interpretation

The IEM generalizes classical Euclidean and Mahalanobis distances by intrinsically adapting to the data distribution. It operates via the geometry of score fields—gradients of the log-probability evaluated under increasing blur. This means that:

  • For Gaussian data, the metric is linear, and IEM matches Mahalanobis.
  • For multi-modal, heavy-tailed, or non-Gaussian data, the score field's inhomogeneity results in a metric that varies both locally (local curvature) and globally (adapting to modes and low-probability regions).

This adaptation arises because the metric accumulates squared differences in the vector field directions as noise is dialed in, providing sensitivity to distributional geometry that standard metrics cannot match.

4. Practical Computation via Denoisers

Since p(x)p(x) and its score are typically unknown in practical applications, the IEM is estimated via a learned denoiser network analogous to those used in score-based generative diffusion models:

  • A denoiser Dτ(,γ)D_\tau(\cdot, \gamma) is trained to recover xx from noisy samples x+ϵx+\epsilon, for various γ\gamma.
  • The Tweedie–Miyasawa formula relates the residual x+ϵDτ(x+ϵ,γ)x+\epsilon - D_\tau(x+\epsilon, \gamma) directly to the score of the blurred density.
  • For each noise level γ\gamma, this enables numerical estimation of the score difference between x1x_1 and x2x_2 via denoiser outputs.
  • The IEM is then computed as a Monte Carlo estimate of the one-dimensional (over γ\gamma) integral of squared score differences.
  • Discretization (e.g., via Euler–Maruyama) is used for the integral, and Brownian motion samples provide the noise trajectory.

For Gaussian data, a linear denoiser suffices; for more complex data, a neural network denoiser must be trained over noise levels.

5. Applications and Empirical Evaluation

The IEM has been evaluated for image similarity, perceptual quality, and representation learning:

  • A diffusion-style denoiser (Hourglass Diffusion Transformer architecture) was trained on ImageNet, with the underlying IEM learned unsupervised (without access to human perceptual data or labels).
  • On datasets such as TID2013, LIVE, CSIQ, and TQD, the learned IEM matched or exceeded the performance of state-of-the-art supervised metrics in predicting human mean opinion scores and forced-choice judgments.
  • The metric naturally adapts to both global and local distortions, offering improved sensitivity to subtle differences closely aligned with human perception.

Beyond image quality assessment, the IEM framework enables unsupervised clustering, data retrieval, representation learning, and potentially objective tuning for image restoration or compression, wherever a data-adaptive metric is needed.

6. Theoretical Connections and Generalizations

For Γ\Gamma \to \infty, the average local metric satisfies:

E[2logp(x)]=E[logp(x)logp(x)]\mathbb{E}[ -\nabla^2 \log p(x) ] = \mathbb{E}[ \nabla \log p(x) \nabla \log p(x)^\top ]

This links the IEM to classical Fisher information identities. For Gaussian signals, the IEM and Mahalanobis coincide globally and locally. For other distributions, the IEM defines a distribution-adaptive Riemannian geometry that incorporates higher-order structure. The approach generalizes to any continuous domain and can be transferred to domains such as speech, time series, or medical imaging, provided a suitable denoiser is available and a sufficiently rich family of noise levels is sampled.

7. Summary Table: Theoretical and Practical Properties

Property Gaussian Case General Distribution
Local Metric Mahalanobis Data-adaptive
Global Metric Mahalanobis Integrated over score field
Computational Req. Closed-form Requires denoiser and MC
Geometric Nature Linear/Elliptic Nonlinear/Riemannian

The IEM’s formulation unifies information, estimation, and geometry, permitting direct learning of data-dependent metrics that are optimal for the underlying application domain and outperform supervised or hand-crafted metrics on complex natural data (Ohayon et al., 2 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Information-Estimation Metric (IEM).