Papers
Topics
Authors
Recent
2000 character limit reached

Natural Wasserstein Metric for Robust Attribution

Updated 11 December 2025
  • Natural Wasserstein Metric is a data-dependent ground metric that integrates learned feature covariance to stabilize attribution estimates in machine learning models.
  • It replaces the Euclidean measure with a Mahalanobis-type metric, significantly reducing spectral amplification and enabling non-vacuous robust certification in deep networks.
  • The approach combines robust influence function analysis, sensitivity kernel evaluation, and Lipschitz constant estimation to deliver efficient, closed-form influence intervals in both convex and deep models.

The Natural Wasserstein metric is a data-dependent ground metric that quantifies perturbations in the geometry induced by a model’s own feature covariance. It is designed to address the problem of spectral amplification in distributionally robust data attribution, stabilizing attribution estimates and enabling non-vacuous certification for influence-based data attribution methods in both convex models and deep neural networks (Li et al., 9 Dec 2025).

1. Influence Functions and Distributional Robustness

Classical influence functions quantify the infinitesimal effect of upweighting a single training example ziz_i on the prediction or loss for a test input ztestz_{\rm test}. In the context of empirical risk minimization,

I(zi,ztest)=gtestH1gi\mathcal I(z_i, z_{\rm test}) = -g_{\rm test}^\top H^{-1} g_i

where gi=θ(θ^,zi)g_i = \nabla_\theta \ell(\hat\theta, z_i), gtest=θ(θ^,ztest)g_{\rm test} = \nabla_\theta \ell(\hat\theta, z_{\rm test}), and H=1nj=1nθ2(θ^,zj)H = \frac{1}{n} \sum_{j=1}^n \nabla^2_\theta \ell(\hat\theta, z_j) with (θ,z)\ell(\theta, z) a C3C^3 loss and H0H \succ 0.

However, standard attribution scores are highly sensitive to distributional perturbations. The Wasserstein-Robust Influence Function (W-RIF) formalizes this by taking the supremum of the influence functional over all distributions QQ within a pp-Wasserstein ball of radius ρ\rho centered at the empirical distribution Pn=1niδziP_n = \frac1n \sum_i \delta_{z_i}: W-RIF(zi;ρ)=supQ:Wp(Q,Pn)ρIQ(zi,ztest)\mathrm{W}\text{-}\mathrm{RIF}(z_i; \rho) = \sup_{Q: W_p(Q, P_n) \leq \rho} \mathcal I_Q(z_i, z_{\rm test}) This formulation rigorously certifies attribution stability against worst-case distributional drifts (Li et al., 9 Dec 2025).

2. Sensitivity Kernel and Robust Influence Certificates

W-RIF can be analyzed through the first-order expansion of the influence functional in powers of QPnQ - P_n: IQ=IPn+S(z)d(QPn)(z)+O(QPn2)\mathcal I_Q = \mathcal I_{P_n} + \int S(z)\,d(Q - P_n)(z) + O(\|Q - P_n\|^2) where the sensitivity kernel S(z)S(z) is

S(z)=u(θ2(θ^,z)H)v+w(Htestv+uHiw)S(z) = u^\top (\nabla^2_\theta \ell(\hat\theta, z) - H)v + w^\top (H_{\rm test} v + u^\top H_i w)

with u=H1gtestu = H^{-1} g_{\rm test}, v=H1giv = H^{-1} g_i, w=H1θ(θ^,z)w = H^{-1} \nabla_\theta \ell(\hat\theta, z), Htest=θ2(θ^,ztest)H_{\rm test} = \nabla^2_\theta \ell(\hat\theta, z_{\rm test}), Hi=θ2(θ^,zi)H_i = \nabla^2_\theta \ell(\hat\theta, z_i).

By applying Kantorovich–Rubinstein duality, the robustness certificate is governed by the Lipschitz constant LSL_S of SS: LS=supzzS(z)S(z)zzL_S = \sup_{z \neq z'} \frac{|S(z) - S(z')|}{\|z - z'\|} The closed-form certificate for small ρ\rho is: W-RIF(zi;ρ)=I(zi,ztest)+ρLS+O(ρ2)\mathrm{W}\text{-}\mathrm{RIF}(z_i; \rho) = \mathcal I(z_i, z_{\rm test}) + \rho L_S + O(\rho^2) yielding efficient, closed-form robust influence intervals in convex settings (Li et al., 9 Dec 2025).

3. Natural Wasserstein Metric and Spectral Amplification

In deep neural networks, naïvely applying a Euclidean Wasserstein ball in high-dimensional feature space yields vacuous certifications due to spectral amplification: the ill-conditioning of learned feature covariance matrices inflates Lipschitz bounds by factors exceeding 10410^4. This is observed empirically as 0% certified robustness for methods such as TRAK when using Euclidean geometry (Li et al., 9 Dec 2025).

The Natural Wasserstein metric replaces the ground metric ϕ(z)ϕ(z)2\|\phi(z) - \phi(z')\|_2 with the Mahalanobis-type metric ϕ(z)ϕ(z)Q1\|\phi(z) - \phi(z')\|_{Q^{-1}}, where QQ denotes the feature covariance. This alignment with the geometry of the learned representation eliminates spectral amplification and ensures that the distributional perturbation set matches the curvature of the sensitivity kernel.

Empirical results demonstrate that replacing the Euclidean metric with the Natural metric reduces worst-case sensitivity by 76×76\times, yielding non-vacuous, certifiable robust attribution for deep networks. For example, using ResNet-18 on CIFAR-10, Natural W-TRAK certifies 68.7% of ranking pairs, compared to 0% for the Euclidean baseline (Li et al., 9 Dec 2025).

4. Algorithmic Procedure and Complexity

The W-RIF pipeline in the convex setting involves:

  1. Solving the ERM to obtain θ^\hat\theta.
  2. Forming and inverting the Hessian HRp×pH \in \mathbb{R}^{p \times p}.
  3. Computing gradients and nominal influences for each training point.
  4. Evaluating the sensitivity kernel S(zj)S(z_j) for each zjz_j.
  5. Estimating the Lipschitz constant LSL_S by maximizing S(zj)S(zk)/zjzk|S(z_j) - S(z_k)| / \|z_j - z_k\|.
  6. Forming robust intervals [IiρLS,Ii+ρLS][\mathcal I_i - \rho L_S, \mathcal I_i + \rho L_S].

Computational costs are O(np2+p3)O(n p^2 + p^3) for HH and H1H^{-1}, O(np2)O(n p^2) for S(zj)S(z_j), and O(n2c)O(n^2 c) for pairwise Lipschitz calculation (with cc the cost of the ground metric). In high dimensions, LSL_S may be estimated from random sample pairs or local neighborhoods (Li et al., 9 Dec 2025).

5. Theoretical Guarantees and Duality

Coverage guarantees for robust influence intervals are established via concentration inequalities for empirical Wasserstein distances. With appropriate adjustment of the radius by the empirical deviation δn(α)\delta_n(\alpha), the interval

[IPn(zi)±(ρ+δn(α))LS][\mathcal I_{P_n}(z_i) \pm (\rho + \delta_n(\alpha)) L_S]

contains the true population-level robust influence with high probability. The derivation leverages distributionally robust optimization duality and Kantorovich–Rubinstein duality, ensuring that the certificate reflects the worst-case influence achievable under feasible Wasserstein perturbations (Li et al., 9 Dec 2025).

6. Extensions to Deep Networks and Attribution Stability

For deep neural networks, the convex W-RIF approach does not directly transfer due to nonconvexity and initialization sensitivity ("basin-hopping" under distribution shifts). The adopted strategy is to linearize the influence functional in feature space for the fixed (pretrained or converged) network, then apply the Wasserstein DRO analysis using the Natural metric. This exactly compensates for feature-space ill-conditioning.

The Self-Influence term is shown to equal the Lipschitz constant governing attribution stability, which provides a theoretical foundation for leverage-based anomaly detection. Empirically, Self-Influence yields strong results for label noise detection, achieving 0.970 AUROC and identifying 94.1% of corrupted labels within the top 20% of training data (Li et al., 9 Dec 2025).

7. Summary and Significance

The Natural Wasserstein metric constitutes a principled, geometrically-adaptive ground metric for robust data attribution. By integrating feature covariance structure, it removes the instability inherent to Euclidean robustness analysis in high-dimensional (especially deep) models and enables the first non-vacuous certified bounds for neural network attribution. The approach generalizes from convex models to deep networks, systematically aligning the perturbation geometry with the local sensitivity landscape, thereby producing efficient, closed-form, and certifiably tight influence intervals fundamental for robust, interpretable machine learning (Li et al., 9 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Natural Wasserstein Metric.