Self-Influence Term

Updated 11 December 2025

Self-Influence Term is a measure quantifying an entity's direct impact on its own outcome across machine learning, bibliometrics, self-supervised, and diffusion contexts.
It is computed using techniques like gradient approximations, Hessian inversions, and power iterations to inform tasks such as data weighting, bias detection, and privacy inference.
Its multifaceted formulations support robust dataset selection and theoretical guarantees, driving practical insights into model stability and organic influence propagation.

The self-influence term quantifies the degree to which an entity (sample, node, or actor) exerts direct influence on itself within a given modeling framework. Its mathematical and operational character varies across domains including machine learning, bibliometrics, self-supervised representation learning, and network diffusion. In each of these contexts, the self-influence constructs play a critical role in quantifying importance, stability, novelty, or organic propagation, often informing data selection, robustness analysis, ranking, or theoretical guarantees.

1. Formal Definitions and Core Variants

Machine Learning and Data Attribution

In the context of deep neural networks, particularly for LLM pre-training, the self-influence (SI) of a data sample $z$ is grounded in first-order approximations of classical influence functions. For a model $f_\theta$ with parameters $\theta$ , loss $\ell(f_\theta, z)$ , and $g(f_\theta, z) = \nabla_\theta \ell(f_\theta, z)$ , the self-influence score is:

$\mathrm{SI}(z) = \|\nabla_\theta \ell(f_\theta, z)\|^2$

Optionally, this can be restricted to a subset $\mathcal{K}$ of layers:

$\mathrm{SI}_\mathcal{K}(z) = \sum_{k \in \mathcal{K}} \|\nabla_{\theta_k} \ell(f_\theta, z)\|^2$

This first-order score serves as a scalable proxy for the instantaneous effect of $z$ on its own loss, as developed in the context of the TracIn framework (Thakkar et al., 2023).

In full influence-function formalism, as used for privacy attacks and classical supervised learning, the self-influence (SIF) includes the local curvature via the Hessian:

$I_\mathrm{SIF}(z) = -\nabla_\theta L(z, \hat\theta)^\top\, H_{\hat\theta}^{-1}\, \nabla_\theta L(z, \hat\theta)$

where $H_{\hat\theta}$ is the empirical risk Hessian (Cohen et al., 2022, Harilal et al., 22 Dec 2024).

Network-Based Influence and Self-Citation

In network and bibliometric models (e.g., Pinski–Narin influence weights), the self-influence term refers to diagonal entries $c_{ii}$ in the journal-journal citation matrix $C$ , corresponding to within-node "self-citation" counts. These are normalized to $m_{ii} = c_{ii} / d_i$ in the column-stochastic influence matrix $M$ and enter recursively into prestige computation (Prathap et al., 2019).

Diffusion and Spontaneous Activation

In probabilistic graph diffusion models (Self-Activation Independent Cascade), the self-influence term is parameterized as $q(u)$ , the probability that node $u$ spontaneously adopts—independent of neighborhood or external intervention. This models organic activation as opposed to induced seeding, and $q(u)$ directly contributes to the expected influence spread (Sun et al., 2019).

2. Theoretical Underpinnings and Derivation

Self-influence scores emerge from first or second-order Taylor approximations of the loss landscape or from Markovian stochastic recursions in network models. In the TracIn-based SI, the score is a first-order quadratic form $\|\nabla_\theta \ell(f_\theta, z)\|^2$ , aligning with gradient-magnitude sensitivity. SIF, by contrast, incorporates the curvature via the local Hessian inverse (see above equations), rooted in the implicit function theorem for parameter perturbations under infinitesimal upweighting.

In network prestige models, self-influence is absorbed through iterative applications of the influence matrix, leveraging Perron–Frobenius theory for the unique eigenvector solution. In influence maximization with self-activation, probabilistic reasoning over possible worlds aggregates per-node $q(u)$ to model both organic and induced spread, while preserving submodularity and allowing for efficient greedy optimization (Sun et al., 2019).

3. Algorithmic Computation and Practical Workflows

Deep Learning (LLM Pre-Training)

To compute SI across a corpus:

For each sample $z_i$ , compute $\mathrm{SI}_{\mathcal{K}}(z_i)$ as the accumulated gradient-norm squared over selected layers.
This procedure can be used for offline sample selection (retain low SI examples) or online sample weighting (Presence method) (Thakkar et al., 2023).

For SIF in membership inference:

Solve for $H_{\hat\theta}^{-1} \nabla_\theta L(z, \hat\theta)$ using Hessian-vector product approximations (Pearlmutter trick, conjugate gradients), making large-scale evaluation tractable in deep networks (Cohen et al., 2022).
Fit thresholds on reference member/non-member sets; infer membership by SIF value.

Representation Learning (Self-Supervised)

For SSL, self-influence is estimated by:

Calculating the gradient of an alignment loss for two augmentations of each sample.
Employing inverse Hessian-vector product solvers (e.g., LoGra) for practical scaling (Harilal et al., 22 Dec 2024).

Network and Citation Models

Influence weights are computed via power iteration on the normalized matrix $M = C D^{-1}$ , with diagonal self-citation terms entering identically to other links but attenuated as recursion proceeds (Prathap et al., 2019).

Influence Maximization

Self-activation parameters $q(u)$ are integrated into simulation-based spread computation and greedy seeding algorithms for maximization tasks, preserving approximation guarantees (Sun et al., 2019).

4. Empirical Properties and Significance

In deep learning, high SI often identifies "hard" or under-represented examples in early pre-training, but noisy or outlier samples in late training. A two-stage schedule (positive to negative softmax temperature) in SI-driven weighting first emphasizes, then de-emphasizes, high-SI points—improving both novelty discovery and training stability (Thakkar et al., 2023).
In self-supervised learning, high self-influence points are empirically associated with atypical backgrounds or underrepresented subgroups, while low-influence samples are often duplicates. Removal of high-influence points may counterintuitively improve downstream classification, indicating their association with spurious invariances (Harilal et al., 22 Dec 2024).
In the SIF context, small SIF values are linked to well-fit, typical (often member) points, whereas large (magnitude) SIF values signal that a sample is impactful in the loss landscape and thus more vulnerable to privacy leakage (Cohen et al., 2022).

5. Integration and Neutralization in Network Measures

In Pinski–Narin influence weights, self-citation (self-influence) initially biases prestige, but after sufficient power-iteration these diagonal entries become "neutralized," exerting negligible skew on the stationary solution. Empirical studies show differences between influence weights with/without self-citations decay exponentially and become marginal after a handful of iterations (Prathap et al., 2019).
Theoretical analysis connects this neutralization to the column-stochastic normalization and spectral properties of $M$ , ensuring that self-influence does not unduly inflate rankings—unlike in raw metrics like JIF.

6. Extensions and Modeling Considerations

Self-influence can be parameterized heterogeneously—for example, setting $q(u)$ to correlate with node degree in diffusion models—which can dramatically alter the solution landscape and make specialized algorithms necessary for correct identification of organic influencers or seeds (Sun et al., 2019).
In SSL, the self-influence formalism exposes hidden representation biases, informs dataset curation (e.g., remove spurious high-influence points), and supports fairness analysis in demographic-balanced scenarios (Harilal et al., 22 Dec 2024).

7. Summary Table: Main Instantiations of the Self-Influence Term

Context	Formal Self-Influence Term	Operational Role
Deep Pretraining	$\mathrm{SI}(z) = \\|\nabla_\theta \ell(f_\theta, z)\\|^2$	Data weighting/filtering
Supervised/SIF	$I_\mathrm{SIF}(z) = -\nabla_\theta L(z)^\top H^{-1} \nabla_\theta L(z)$	Privacy/member inference
SSL	$-\,\nabla_\theta \ell_i^\top H^{-1} \nabla_\theta \ell_i$	Stability, bias detection
Citation	$c_{ii}/d_i$ in eigenvector recursion	Prestige (self-citation)
Diffusion	$q(u)$ (spontaneous activation probability)	Organic influence spread

Self-influence terms are thus fundamental for quantifying self-attribution mechanisms across a range of learning and network paradigms, with their computational form and theoretical properties tailored to each modeling scenario.