Natural Wasserstein Metric for Robust Attribution
- Natural Wasserstein Metric is a data-dependent ground metric that integrates learned feature covariance to stabilize attribution estimates in machine learning models.
- It replaces the Euclidean measure with a Mahalanobis-type metric, significantly reducing spectral amplification and enabling non-vacuous robust certification in deep networks.
- The approach combines robust influence function analysis, sensitivity kernel evaluation, and Lipschitz constant estimation to deliver efficient, closed-form influence intervals in both convex and deep models.
The Natural Wasserstein metric is a data-dependent ground metric that quantifies perturbations in the geometry induced by a model’s own feature covariance. It is designed to address the problem of spectral amplification in distributionally robust data attribution, stabilizing attribution estimates and enabling non-vacuous certification for influence-based data attribution methods in both convex models and deep neural networks (Li et al., 9 Dec 2025).
1. Influence Functions and Distributional Robustness
Classical influence functions quantify the infinitesimal effect of upweighting a single training example on the prediction or loss for a test input . In the context of empirical risk minimization,
where , , and with a loss and .
However, standard attribution scores are highly sensitive to distributional perturbations. The Wasserstein-Robust Influence Function (W-RIF) formalizes this by taking the supremum of the influence functional over all distributions within a -Wasserstein ball of radius centered at the empirical distribution : This formulation rigorously certifies attribution stability against worst-case distributional drifts (Li et al., 9 Dec 2025).
2. Sensitivity Kernel and Robust Influence Certificates
W-RIF can be analyzed through the first-order expansion of the influence functional in powers of : where the sensitivity kernel is
with , , , , .
By applying Kantorovich–Rubinstein duality, the robustness certificate is governed by the Lipschitz constant of : The closed-form certificate for small is: yielding efficient, closed-form robust influence intervals in convex settings (Li et al., 9 Dec 2025).
3. Natural Wasserstein Metric and Spectral Amplification
In deep neural networks, naïvely applying a Euclidean Wasserstein ball in high-dimensional feature space yields vacuous certifications due to spectral amplification: the ill-conditioning of learned feature covariance matrices inflates Lipschitz bounds by factors exceeding . This is observed empirically as 0% certified robustness for methods such as TRAK when using Euclidean geometry (Li et al., 9 Dec 2025).
The Natural Wasserstein metric replaces the ground metric with the Mahalanobis-type metric , where denotes the feature covariance. This alignment with the geometry of the learned representation eliminates spectral amplification and ensures that the distributional perturbation set matches the curvature of the sensitivity kernel.
Empirical results demonstrate that replacing the Euclidean metric with the Natural metric reduces worst-case sensitivity by , yielding non-vacuous, certifiable robust attribution for deep networks. For example, using ResNet-18 on CIFAR-10, Natural W-TRAK certifies 68.7% of ranking pairs, compared to 0% for the Euclidean baseline (Li et al., 9 Dec 2025).
4. Algorithmic Procedure and Complexity
The W-RIF pipeline in the convex setting involves:
- Solving the ERM to obtain .
- Forming and inverting the Hessian .
- Computing gradients and nominal influences for each training point.
- Evaluating the sensitivity kernel for each .
- Estimating the Lipschitz constant by maximizing .
- Forming robust intervals .
Computational costs are for and , for , and for pairwise Lipschitz calculation (with the cost of the ground metric). In high dimensions, may be estimated from random sample pairs or local neighborhoods (Li et al., 9 Dec 2025).
5. Theoretical Guarantees and Duality
Coverage guarantees for robust influence intervals are established via concentration inequalities for empirical Wasserstein distances. With appropriate adjustment of the radius by the empirical deviation , the interval
contains the true population-level robust influence with high probability. The derivation leverages distributionally robust optimization duality and Kantorovich–Rubinstein duality, ensuring that the certificate reflects the worst-case influence achievable under feasible Wasserstein perturbations (Li et al., 9 Dec 2025).
6. Extensions to Deep Networks and Attribution Stability
For deep neural networks, the convex W-RIF approach does not directly transfer due to nonconvexity and initialization sensitivity ("basin-hopping" under distribution shifts). The adopted strategy is to linearize the influence functional in feature space for the fixed (pretrained or converged) network, then apply the Wasserstein DRO analysis using the Natural metric. This exactly compensates for feature-space ill-conditioning.
The Self-Influence term is shown to equal the Lipschitz constant governing attribution stability, which provides a theoretical foundation for leverage-based anomaly detection. Empirically, Self-Influence yields strong results for label noise detection, achieving 0.970 AUROC and identifying 94.1% of corrupted labels within the top 20% of training data (Li et al., 9 Dec 2025).
7. Summary and Significance
The Natural Wasserstein metric constitutes a principled, geometrically-adaptive ground metric for robust data attribution. By integrating feature covariance structure, it removes the instability inherent to Euclidean robustness analysis in high-dimensional (especially deep) models and enables the first non-vacuous certified bounds for neural network attribution. The approach generalizes from convex models to deep networks, systematically aligning the perturbation geometry with the local sensitivity landscape, thereby producing efficient, closed-form, and certifiably tight influence intervals fundamental for robust, interpretable machine learning (Li et al., 9 Dec 2025).