Wasserstein-RIF: Robust Data Attribution
- Wasserstein-RIF are robust extensions of classical influence functions that certify model sensitivity under worst-case distributional perturbations.
- They employ optimal transport and Wasserstein metrics to compute certified intervals, providing formal coverage guarantees for leave-one-out and population influence.
- In deep networks, Natural Wasserstein metrics yield tighter certificates and facilitate robust anomaly detection, overcoming Euclidean certification limitations.
Wasserstein-Robust Influence Functions (W-RIF) generalize classical influence functions to provide certified robustness under distributional shifts, using optimal transport metrics. W-RIF enables the quantification of how training examples influence model predictions while accounting for worst-case perturbations measured in Wasserstein distance. This framework yields formal coverage guarantees in convex models and provides new geometric tools for certified data attribution in deep neural networks, overcoming severe limitations of Euclidean-based certification by introducing a Natural Wasserstein metric derived from feature covariance geometry (Li et al., 9 Dec 2025).
1. Classical Influence Functions and Their Limitations
Given a data-generating distribution over and a twice-differentiable loss , the empirical risk minimizer is
for empirical . The classical influence of training point on test loss at is
where is assumed positive definite. This formula quantifies the first-order impact of up-weighting or removing but lacks robustness to distributional perturbations and fails to provide certified intervals for influence under data shifts (Li et al., 9 Dec 2025).
2. Wasserstein Uncertainty Sets and Rationale
The -Wasserstein distance between distributions and on is
The corresponding Wasserstein ball defines an adversarial uncertainty set for robust analysis. Wasserstein metrics are preferred over Euclidean parameter perturbations because they quantify distributional (mass transport) shifts, accommodate support changes such as outlier addition/removal, and allow tractable duality-based reformulations via Kantorovich–Rubinstein duality. This construction provides a natural notion of uncertainty for robust influence function analysis (Li et al., 9 Dec 2025).
3. Definition and Computation of W-RIF
For any , the W-RIF at radius is
Substituting empirical estimators and , a first-order expansion yields
where the complete sensitivity kernel is the sum of and , both involving Hessian and gradient terms. If is -Lipschitz with respect to the input norm, the dual form implies
leading to the closed-form certified interval
This interval is guaranteed to contain the leave-one-out or true population-level influence as detailed below (Li et al., 9 Dec 2025).
4. Provable Certification and Coverage Guarantees
Leave-one-out influence, given by removing from , is a specific distributional perturbation: . Setting , the W-RIF interval around certifies the true leave-one-out influence. For population-level guarantees, one selects
so that, with probability at least , and the population influence is covered: This provides formal certification for robust attribution and data influence quantification in convex models (Li et al., 9 Dec 2025).
5. Computational Strategies
Exact estimation of by pairwise slope calculation has computational complexity . Practical approximations include:
- Estimating at a random subset of points.
- Solving the dual optimization
using projected gradient methods, which scale linearly in . All other computations (Hessian inversion, gradient evaluation) adhere to standard influence-function workflows (Li et al., 9 Dec 2025).
6. Extensions to Deep Networks and the Spectral Amplification Barrier
In non-convex deep networks, the parameter solution map may change discontinuously under data perturbations, rendering classical W-RIF constructions invalid. TRAK (a linearized attribution method at the fixed network) is formulated as
where . However, naive use of Euclidean -balls in feature space results in vacuous certificates, because the relevant Lipschitz constant scales inversely with the smallest eigenvalue of , and deep representations typically exhibit ill-conditioning with condition numbers of –. Empirically, such Euclidean certificates cover of ranking pairs (Li et al., 9 Dec 2025).
7. Natural Wasserstein Metric and Robust Neural Attribution
To address spectral amplification, the Natural Wasserstein metric is defined as
In this induced geometry, the Lipschitz constant of the attribution score coincides with the data-dependent “Self-Influence” score: The critical bound is
Empirically, this yields certified intervals that are $10$– tighter than Euclidean baselines. On CIFAR-10 with ResNet-18, Natural W-TRAK certificates cover of ranking pairs, in contrast to for Euclidean approaches (Li et al., 9 Dec 2025).
Furthermore, Self-Influence not only certifies attribution robustness but also provides a mathematically grounded leverage score for anomaly detection, achieving AUROC $0.970$ for label noise detection and identifying of corrupted labels when considering the top of the training data (Li et al., 9 Dec 2025).
Wasserstein-Robust Influence Functions unify classical and modern approaches to data attribution by extending influence function analysis to robust, distributional settings. For convex models, W-RIF yields -accurate certified intervals with provable guarantees for leave-one-out and population influence. For deep networks, robust certification is only achievable by linearizing at the feature level and measuring perturbations in the Natural Wasserstein metric, thereby circumventing the issue of spectral amplification. The theory provides a formal basis for certified data valuation, debugging, unlearning, and robust anomaly detection in high-dimensional, non-convex machine learning models (Li et al., 9 Dec 2025).