Papers
Topics
Authors
Recent
2000 character limit reached

Self-Influence Term

Updated 11 December 2025
  • Self-Influence Term is a measure quantifying an entity's direct impact on its own outcome across machine learning, bibliometrics, self-supervised, and diffusion contexts.
  • It is computed using techniques like gradient approximations, Hessian inversions, and power iterations to inform tasks such as data weighting, bias detection, and privacy inference.
  • Its multifaceted formulations support robust dataset selection and theoretical guarantees, driving practical insights into model stability and organic influence propagation.

The self-influence term quantifies the degree to which an entity (sample, node, or actor) exerts direct influence on itself within a given modeling framework. Its mathematical and operational character varies across domains including machine learning, bibliometrics, self-supervised representation learning, and network diffusion. In each of these contexts, the self-influence constructs play a critical role in quantifying importance, stability, novelty, or organic propagation, often informing data selection, robustness analysis, ranking, or theoretical guarantees.

1. Formal Definitions and Core Variants

Machine Learning and Data Attribution

In the context of deep neural networks, particularly for LLM pre-training, the self-influence (SI) of a data sample zz is grounded in first-order approximations of classical influence functions. For a model fθf_\theta with parameters θ\theta, loss (fθ,z)\ell(f_\theta, z), and g(fθ,z)=θ(fθ,z)g(f_\theta, z) = \nabla_\theta \ell(f_\theta, z), the self-influence score is:

SI(z)=θ(fθ,z)2\mathrm{SI}(z) = \|\nabla_\theta \ell(f_\theta, z)\|^2

Optionally, this can be restricted to a subset K\mathcal{K} of layers:

SIK(z)=kKθk(fθ,z)2\mathrm{SI}_\mathcal{K}(z) = \sum_{k \in \mathcal{K}} \|\nabla_{\theta_k} \ell(f_\theta, z)\|^2

This first-order score serves as a scalable proxy for the instantaneous effect of zz on its own loss, as developed in the context of the TracIn framework (Thakkar et al., 2023).

In full influence-function formalism, as used for privacy attacks and classical supervised learning, the self-influence (SIF) includes the local curvature via the Hessian:

ISIF(z)=θL(z,θ^)Hθ^1θL(z,θ^)I_\mathrm{SIF}(z) = -\nabla_\theta L(z, \hat\theta)^\top\, H_{\hat\theta}^{-1}\, \nabla_\theta L(z, \hat\theta)

where Hθ^H_{\hat\theta} is the empirical risk Hessian (Cohen et al., 2022, Harilal et al., 22 Dec 2024).

Network-Based Influence and Self-Citation

In network and bibliometric models (e.g., Pinski–Narin influence weights), the self-influence term refers to diagonal entries ciic_{ii} in the journal-journal citation matrix CC, corresponding to within-node "self-citation" counts. These are normalized to mii=cii/dim_{ii} = c_{ii} / d_i in the column-stochastic influence matrix MM and enter recursively into prestige computation (Prathap et al., 2019).

Diffusion and Spontaneous Activation

In probabilistic graph diffusion models (Self-Activation Independent Cascade), the self-influence term is parameterized as q(u)q(u), the probability that node uu spontaneously adopts—independent of neighborhood or external intervention. This models organic activation as opposed to induced seeding, and q(u)q(u) directly contributes to the expected influence spread (Sun et al., 2019).

2. Theoretical Underpinnings and Derivation

Self-influence scores emerge from first or second-order Taylor approximations of the loss landscape or from Markovian stochastic recursions in network models. In the TracIn-based SI, the score is a first-order quadratic form θ(fθ,z)2\|\nabla_\theta \ell(f_\theta, z)\|^2, aligning with gradient-magnitude sensitivity. SIF, by contrast, incorporates the curvature via the local Hessian inverse (see above equations), rooted in the implicit function theorem for parameter perturbations under infinitesimal upweighting.

In network prestige models, self-influence is absorbed through iterative applications of the influence matrix, leveraging Perron–Frobenius theory for the unique eigenvector solution. In influence maximization with self-activation, probabilistic reasoning over possible worlds aggregates per-node q(u)q(u) to model both organic and induced spread, while preserving submodularity and allowing for efficient greedy optimization (Sun et al., 2019).

3. Algorithmic Computation and Practical Workflows

Deep Learning (LLM Pre-Training)

To compute SI across a corpus:

  • For each sample ziz_i, compute SIK(zi)\mathrm{SI}_{\mathcal{K}}(z_i) as the accumulated gradient-norm squared over selected layers.
  • This procedure can be used for offline sample selection (retain low SI examples) or online sample weighting (Presence method) (Thakkar et al., 2023).

For SIF in membership inference:

  • Solve for Hθ^1θL(z,θ^)H_{\hat\theta}^{-1} \nabla_\theta L(z, \hat\theta) using Hessian-vector product approximations (Pearlmutter trick, conjugate gradients), making large-scale evaluation tractable in deep networks (Cohen et al., 2022).
  • Fit thresholds on reference member/non-member sets; infer membership by SIF value.

Representation Learning (Self-Supervised)

For SSL, self-influence is estimated by:

Network and Citation Models

  • Influence weights are computed via power iteration on the normalized matrix M=CD1M = C D^{-1}, with diagonal self-citation terms entering identically to other links but attenuated as recursion proceeds (Prathap et al., 2019).

Influence Maximization

  • Self-activation parameters q(u)q(u) are integrated into simulation-based spread computation and greedy seeding algorithms for maximization tasks, preserving approximation guarantees (Sun et al., 2019).

4. Empirical Properties and Significance

  • In deep learning, high SI often identifies "hard" or under-represented examples in early pre-training, but noisy or outlier samples in late training. A two-stage schedule (positive to negative softmax temperature) in SI-driven weighting first emphasizes, then de-emphasizes, high-SI points—improving both novelty discovery and training stability (Thakkar et al., 2023).
  • In self-supervised learning, high self-influence points are empirically associated with atypical backgrounds or underrepresented subgroups, while low-influence samples are often duplicates. Removal of high-influence points may counterintuitively improve downstream classification, indicating their association with spurious invariances (Harilal et al., 22 Dec 2024).
  • In the SIF context, small SIF values are linked to well-fit, typical (often member) points, whereas large (magnitude) SIF values signal that a sample is impactful in the loss landscape and thus more vulnerable to privacy leakage (Cohen et al., 2022).

5. Integration and Neutralization in Network Measures

  • In Pinski–Narin influence weights, self-citation (self-influence) initially biases prestige, but after sufficient power-iteration these diagonal entries become "neutralized," exerting negligible skew on the stationary solution. Empirical studies show differences between influence weights with/without self-citations decay exponentially and become marginal after a handful of iterations (Prathap et al., 2019).
  • Theoretical analysis connects this neutralization to the column-stochastic normalization and spectral properties of MM, ensuring that self-influence does not unduly inflate rankings—unlike in raw metrics like JIF.

6. Extensions and Modeling Considerations

  • Self-influence can be parameterized heterogeneously—for example, setting q(u)q(u) to correlate with node degree in diffusion models—which can dramatically alter the solution landscape and make specialized algorithms necessary for correct identification of organic influencers or seeds (Sun et al., 2019).
  • In SSL, the self-influence formalism exposes hidden representation biases, informs dataset curation (e.g., remove spurious high-influence points), and supports fairness analysis in demographic-balanced scenarios (Harilal et al., 22 Dec 2024).

7. Summary Table: Main Instantiations of the Self-Influence Term

Context Formal Self-Influence Term Operational Role
Deep Pretraining SI(z)=θ(fθ,z)2\mathrm{SI}(z) = \|\nabla_\theta \ell(f_\theta, z)\|^2 Data weighting/filtering
Supervised/SIF ISIF(z)=θL(z)H1θL(z)I_\mathrm{SIF}(z) = -\nabla_\theta L(z)^\top H^{-1} \nabla_\theta L(z) Privacy/member inference
SSL θiH1θi-\,\nabla_\theta \ell_i^\top H^{-1} \nabla_\theta \ell_i Stability, bias detection
Citation cii/dic_{ii}/d_i in eigenvector recursion Prestige (self-citation)
Diffusion q(u)q(u) (spontaneous activation probability) Organic influence spread

Self-influence terms are thus fundamental for quantifying self-attribution mechanisms across a range of learning and network paradigms, with their computational form and theoretical properties tailored to each modeling scenario.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Self-Influence Term.