Influence Functions in Deep Learning
- Influence functions are first-order methods that quantify the effect of individual training samples on deep neural network parameters and predictions.
- They utilize techniques such as iterative solvers, curvature approximations, and Bayesian methods to make computation feasible in large-scale, non-convex settings.
- Applications include mislabel detection, data cleansing, uncertainty quantification, fairness auditing, and targeted model editing, driving ongoing advances in algorithmic reliability.
Influence functions provide a first-order, functional-analytic framework for quantifying the effect of infinitesimal perturbations to individual training data points on learned model parameters and downstream predictions. In deep learning, influence functions have emerged as a fundamental technique for interpreting model predictions, tracing misbehaving examples, auditing training data, quantifying uncertainty, auditing fairness, and managing large-scale data-centric pipelines. Originally from robust statistics, these methods were adapted to deep neural networks by Koh and Liang, leading to a wave of algorithmic, theoretical, and empirical research on their tractability, reliability, and extensions.
1. Mathematical Foundations and Core Formulation
For a model parametrized by trained on dataset through minimization of the empirical risk , the influence function quantifies how an infinitesimal up-weighting of a sample affects the fitted parameter and a downstream quantity, typically a loss evaluated at some test point . Up-weighting by yields perturbed empirical risk:
Letting be the minimizer, the first-order influence on the parameters is
where 0 is the Hessian at 1. The influence on a downstream test loss is therefore
2
(Alaa et al., 2020, Han et al., 2020, Zhu et al., 10 Aug 2025). This formula applies both in regression and classification, and extends to arbitrary differentiable functionals 3 using the sensitivity 4 (Wang et al., 2022).
2. Computational Methods and Curvature Approximations
Due to the dimensionality (5) and non-convexity typical in deep neural nets, direct computation of 6 is infeasible. Three principal strategies are used:
- Iterative solvers (Conjugate Gradient, LiSSA): Rely on Hessian–vector products (HVPs) computable by automatic differentiation; require 7 time, where 8 is the number of HVPs (Han et al., 2020, Zhu et al., 10 Aug 2025).
- Structured curvature approximations: Generalized Gauss–Newton (GGN), Block-diagonal GGN, Kronecker-Factored Approximate Curvature (K-FAC), and Eigenvalue-Corrected K-FAC (EK-FAC) trade off cost and fidelity. Higher-fidelity approximations (EK-FAC) recover more accurate influence scores, especially in deep nets, while K-FAC, though cheap, incurs significant ranking errors for influential points (Hong et al., 27 Sep 2025).
- Hessian-free and Bayesian: Classical IFs fail in singular curvature regimes; Bayesian influence functions (BIF) circumvent inversion by estimating covariances over SGLD samples from a local posterior, scaling to networks with billions of parameters (Kreer et al., 30 Sep 2025).
A further widely adopted, computationally minimal approximation—TracIn or IP (“inner product”)—simply replaces 9 by the identity, yielding
0
While this can be effective in practice for ranking, it does not yield a true leave-one-out effect (Yang et al., 2024).
3. Theoretical Guarantees, Failure Modes, and Extensions
Classical influence theory assumes strong convexity, smoothness, and convergence to a unique minimum. In modern deep learning, essential caveats and limitations arise:
- Non-convexity and Hessian degeneracy: Neural net loss landscapes exhibit many saddle points and flat directions, so 1 is indefinite or singular; small regularization or positive semidefinitizations (e.g., using the GGN or the Fisher) are employed (Basu et al., 2020, Epifano et al., 2023).
- Linearization failure: The first-order Taylor expansion is accurate only locally; in deep, wide nets, higher-order corrections become prominent, especially as the largest eigenvalue of the Hessian grows with depth (Ye et al., 25 May 2025, Epifano et al., 2023).
- Fading of predictive power: The parameter difference 2 can grow super-linearly with the number of training steps after a perturbation, rapidly invalidating the local first-order prediction (see the discrete Grönwall bound and empirical results in (Schioppa et al., 2023)).
- Distinction between matched-objective, cross-loss, and “true” leave-one-out retraining: Recent theory shows that standard IFs with fixed Hessian most closely approximate the proximal Bregman response function (PBRF), not literal cold-start LOO retraining. This resolves the discrepancy between high empirical utility and poor LOO alignment in deep nets (Bae et al., 2022).
Recent advances include:
- Higher-order influence (HOIF): Taylor expansion to second or higher order; key to uncertainty quantification methods like the Discriminative Jackknife (Alaa et al., 2020).
- Generalized Influence Functions (GIF): Restricting parameter updates to subspaces associated with an input (e.g., via winning-ticket heuristics) can yield superior fidelity to retraining, empirically showing “less is more” (Lyu et al., 2023).
- Flat-minima IF: Pursuing flatness in the validation risk landscape significantly improves reliability of IF-based loss change predictions (Ye et al., 25 May 2025).
- Bayesian IF (BIF): Local posterior variances can substitute for Hessian reliance, circumventing instabilities in high-dimensional, ill-conditioned regimes (Kreer et al., 30 Sep 2025).
4. Algorithmic and Workflow Developments
Influence functions are embedded in a variety of practical workflows:
- Top-3 influence retrieval: Given a test instance, identify the most supportive or harmful training examples by ranking 4, using iterative HVP solvers or TracIn as appropriate (Han et al., 2020, Yang et al., 2024). FastIF and storage-efficient variants further accelerate the computation by exploiting checkpoint symmetries or caching only final weights (Suzuki et al., 2021).
- Data cleansing: Training data with large self-influence (high 5) are probable outliers and their removal measurably increases accuracy; implemented in AutoML and GUI toolchains (Suzuki et al., 2021).
- Uncertainty quantification: DJ and related methods use HOIF to efficiently approximate leave-one-out parameters and construct coverage intervals with frequentist guarantees competitive with full Bayesian approaches (Alaa et al., 2020).
- Active label correction and annotation feedback: InfFeed iteratively surfaces negatively influential examples for human cross-check, accelerating dataset curation with dramatic annotation efficiency (Banerjee et al., 2024).
- Fairness optimization: FairIF solves for influence-based reweighting minimizing group disparity metrics (AD, AOD, EOD) using only a validation set with sensitive attributes (Wang et al., 2022).
- Targeted model editing or fine-tuning: Selected highly influential training points are up/down-weighted or relabeled to drive model outputs or error corrections, enabling micro-edits without full retraining (Tuononen et al., 19 Sep 2025, Schioppa et al., 2023).
- Cross-loss attributions: CLIF enables tracing which training samples, learned under one loss, drive performance on any downstream (possibly unsupervised or bias-oriented) test loss (Silva et al., 2020).
A summary of commonly used influence algorithms and their computational characteristics appears below.
| Approach | Fidelity to LOO/PBRF | Handles non-convexity? | Cost per query | Notes |
|---|---|---|---|---|
| CG/LiSSA | High (locally) | Partially | O(k·p) HVPs | Assumes 6 invertible or adequately regularized |
| EK-FAC | Moderate-High | Layerwise (GGN/Fisher) | O(p) | Poor at capturing fine cross-layer correlation |
| TracIn/IP | Lower (correl.) | Robust to non-convex | O(p) | Fast, often sufficient for ranking |
| Bayesian IF | High | Yes, hessian-free | O(Tp) (SGMCMC) | Requires parallel batch/Langevin sampling |
| HOIF/DJ | High (coverage) | Empirically robust | O(npm) | Complexity grows with number of HOIF orders 7 |
5. Empirical Performance and Benchmarks
Empirical studies across vision, NLP, tabular, and synthetic domains reveal several practical touchstones:
- Mislabel detection: Self-influence 8 robustly identifies mislabels or outliers; top-ranked points recover most corrupted examples even on large datasets (Zhu et al., 10 Aug 2025).
- Explanation tasks: In tasks requiring high-level reasoning (e.g., NLI), influence-based data attributions outperform token-level saliency in surfacing supporting evidence, capturing global artifacts like lexical overlap (Han et al., 2020).
- Uncertainty intervals: DJ achieves near-nominal coverage and strong discrimination between high- and low-error predictions compared to MC Dropout, Bayesian NNs, and ensembles across UCI regression tasks; see Section 7 of (Alaa et al., 2020) for comparative statistics.
- Fine-tuning and unlearning: GIF updating 5% of model parameters matches or exceeds retrain accuracy for class removal and backdoor elimination, outperforming full-space IF, few-shot unlearning, and second-order IFs (Lyu et al., 2023).
- Fairness debiasing: FairIF on CI-MNIST, Adult, COMPAS, and CelebA achieves parity or better accuracy vs. domain-specific baselines, often halving fairness gaps with minimal accuracy loss (Wang et al., 2022).
- Scaling and runtime: Storage-efficient and Bayesian IFs enable application to models with 2.8B+ parameters (e.g., Pythia-2.8B), reducing runtime from days (EK-FAC) to hours (BIF) (Kreer et al., 30 Sep 2025).
- Synthetic evaluation: In convex (linear/logistic) regimes, classical IFs attain near-perfect Spearman/Pearson correlation with leave-one-out retrain shifts; under non-convexity, correlation generally decays with depth and network width, and is highly test-point dependent (Basu et al., 2020, Epifano et al., 2023).
6. Limitations, Fragility, and Active Research Directions
Despite broad adoption, several failure modes and research challenges for influence functions persist:
- Fragility to non-convexity: Influence scores can be highly unstable in deep, overparameterized architectures, especially under low regularization, and may poorly predict leave-one-out retrain effects (Basu et al., 2020, Epifano et al., 2023, Schioppa et al., 2023).
- Hessian approximations: Structured curvature methods trade scalability for decreasing fidelity to full Hessian inversion; eigenvalue-correction (EK-FAC) is critical in preserving ranking of true highly influential points (Hong et al., 27 Sep 2025).
- Locality only: Influence is a local, first-order object. It ceases to match actual retrain effects after more than a few fine-tuning steps, and higher-order corrections (HOIF, BIF) are required for improved coverage or global fidelity (Alaa et al., 2020, Kreer et al., 30 Sep 2025).
- PBRF vs. LOO semantics: Influence-based updates on deep nets answer a proximal Bregman-regularized problem, not the literal cold-start leave-one-out question (Bae et al., 2022).
- Curation and annotation risk: Influence-guided feedback loops such as InfFeed require careful pipeline auditing and robust annotation, since propagating corrections from high-influence but misannotated examples could bias subsequent models (Banerjee et al., 2024).
- Open theoretical questions: Characterizing precise breakdown boundaries for first-order validity, scaling BIF to continual/online settings, and exploring data attribution beyond additive (linear) regimes remain open (Saunshi et al., 2022, Zhu et al., 10 Aug 2025).
7. Impact, Applications, and Future Trajectories
Influence functions underpin a growing ecosystem of data-centric deep learning methodologies:
- Explainable AI: Providing faithful, data-level explanations for model predictions, surfacing dataset artifacts and hidden biases (Han et al., 2020, Saunshi et al., 2022).
- Data attribution & model accountability: Tracing spurious correlations, detecting poisoned or mislabeled samples, and supporting dynamic data maintenance (Zhu et al., 10 Aug 2025, Silva et al., 2020).
- Unlearning and debugging: Enabling rapid, targeted weight updates or removals for privacy (machine unlearning), robustness, or correction of erroneous behaviors (Lyu et al., 2023, Bae et al., 2022).
- Uncertainty quantification and calibration: Constructing confidence intervals with frequentist coverage guarantees and adaptive discrimination (Alaa et al., 2020).
- Fairness and bias mitigation: Automating label weighting or selection schemes to enforce demographic parity or other group fairness criteria efficiently and stably (Wang et al., 2022).
- Large-model scaling: Bayesian and stochastic approaches (local BIF, SGMCMC) point towards tractable, hessian-free data attribution pipelines for trillion-parameter regimes (Kreer et al., 30 Sep 2025).
Active research addresses robustness to non-convexity, efficient and stable curvature estimation (EK-FAC, block-GGN, BIF), hybrid Bayesian–frequentist uncertainty approaches, automated group influence and artifact detection, and theoretically sound, fast influence estimators for streaming and federated learning environments. The field continues to expand in both fundamental theory and end-to-end data-centric systems.
Influence functions in deep learning thus constitute a core theoretical and algorithmic primitive for data attribution, interpretability, uncertainty, and fairness, with ongoing refinements to handle the unique challenges of deep, large-scale, and high-stakes models (Alaa et al., 2020, Basu et al., 2020, Zhu et al., 10 Aug 2025, Hong et al., 27 Sep 2025, Lyu et al., 2023).