Influence-based Attacks

Updated 11 November 2025

Influence-based attacks are adversarial strategies that exploit model and data sensitivities to manipulate decision processes in machine learning and networked systems.
They utilize influence functions, submodular optimization, and targeted perturbations to degrade model accuracy and alter predictions under constrained budgets.
Empirical studies show significant impacts, including accuracy drops in GNNs and recommendation shifts in recommender systems, highlighting urgent security challenges.

Influence-based attacks constitute a family of adversarial methodologies that exploit, manipulate, or subvert the mechanisms by which influence—broadly construed as the propagation of effects through data, models, or networks—drives decisions in machine learning systems and networked environments. The unifying theme is the adversarial use of influence, operationalized through tools such as influence functions, influence maximization/submodular optimization, and data-driven surrogate learning, to either degrade performance, subvert attributions, or manipulate downstream predictions under constrained information or action budgets. These attacks arise in a variety of machine learning, network science, security, and social computing contexts.

1. Fundamental Principles and Frameworks

At the theoretical core of influence-based attacks is the observation that many learning and inference systems propagate the effect of data, features, or nodes through linear or non-linear transformation chains, where the impact of a small set of components can be amplified, modulated, or redirected. Influence functions—rooted in robust statistics and convex optimization—quantify the infinitesimal effect of upweighting or perturbing a training data point $z$ on a downstream metric (usually the loss or prediction at a test example). The general formula for a model parameterized by $\theta^*$ is

$I_{\text{up,loss}}(z, z_{\text{test}}) = -\nabla_\theta \ell(z_{\text{test}}, \theta^*)^\top H_{\theta^*}^{-1} \nabla_\theta \ell(z, \theta^*)$

where $H_{\theta^*}$ is the Hessian of the empirical risk. Variants appear for specific objectives, model families, or “attack surfaces.”

The influence-maximization paradigm is established in network science: given a graph with probabilistic propagation dynamics (e.g., linear threshold or independent cascade), select a small seed set or cut a small number of nodes/edges to maximize or minimize spread. The theoretical backbone is NP-hardness of seed selection but strong submodularity properties that allow nearly optimal greedy approximation algorithms.

2. Canonical Attack Scenarios and Methodologies

2.1 Influence-based Feature Attacks on GNNs

Attacks on Graph Neural Networks (GNNs), such as those formalized by Ma et al. (Ma et al., 2021), proceed by choosing a budgeted subset $S$ of nodes (constrained by degree and cardinality) and perturbing their features by a fixed vector $\varepsilon$ to maximize misclassification over the graph. Under the “random-ReLU” assumption, the logit response of every node becomes a linear function of the sum of L-length random-walk influences from the seeds. The attack is cast as a submodular maximization problem of the form

$f(S) = \sum_{j \in V} 1\left\{ \sum_{i \in S} B_{ji} > \theta_j \right\}$

where $B = M^L$ (L-step random-walk matrix), and thresholds $\theta_j$ depend on logit margins and the perturbation vector’s alignment in feature direction. Greedy algorithms using surrogate threshold distributions (uniform or normal) are both theoretically sound (via submodular maximization guarantees) and empirically dominant. Key numerical results: on Cora/GCN, test accuracy drops from ~85.5% to 68.8%, outperforming all baselines at fixed, realistic attack budgets.

2.2 Data Poisoning via Influence Functions in Recommender Systems

In matrix-factorization recommenders, the attacker injects fake users with interaction patterns optimized to increase (or decrease) the probability that a designated item appears in as many users’ top-N lists as possible (Fang et al., 2020). The optimization is NP-hard but can be approached by identifying a small, submodular set of “high-influence” users—those whose interaction changes most affect the recommendation score for the target item—and only optimizing poison data for this subset. The influence of removing or modifying a user-item edge on the score for a target user/item pair is computed via first-order influence function perturbations involving the loss Hessian, and greedy incremental algorithms yield scalable attack procedures. At a 3% injection rate, hit-rate HR@10 is driven from 0.0017 (no attack) to 0.417 (S-TNA-Inf), surpassing all tested baselines.

2.3 Influence-Guided Data Poisoning in Sequence Generation and Summarization

For abstractive summarization, influence-based poisoning selects and perturbs a high-influence subset of training summaries—identified by the largest estimated effect on validation/test loss via Hessian-vector products—such that injecting contrastive or toxic rewrites can force models into systematic behavioral changes (Thota et al., 2024). Empirical analysis shows that poisoning just 30% of summaries with inverted sentiment causes 88–90% of test summaries to invert sentiment, while also shifting models from abstractive to extractive output regimes.

2.4 Influence-based Attacks in Sequential Recommenders

Profile pollution attacks on sequential recommenders are enhanced using influence function–based attack planning rather than simple gradient-based heuristics (Du et al., 2024). INFAttack computes, for each possible item insertion, the expected change in the target item's rank in users' recommendation lists via second-order (inverse-Hessian) approximations. This approach robustly outperforms baseline attacks, especially for unpopular items, and achieves up to a 20-fold gain in NDCG@10 for tail items at low injection rates.

2.5 Influence Attacks on Attribution and Memorization Mechanisms

Memorization and influence scores themselves are attackable targets: inputs crafted via the Moore–Penrose pseudoinverse can produce arbitrarily high influence or memorization ratings under black-box estimation regimes (Do et al., 24 Sep 2025). Analytical results show that unless influence estimation is formally stabilized (e.g., by differential privacy), there exist small, efficiently computable perturbations that can manipulate influence/attribution or data value measurements by orders of magnitude, with negligible impact on test set accuracy or outputs.

In distributed settings—such as cooperative spectrum sensing—attackers can learn a black-box aggregator’s (fusion center) decision boundary via observed outputs and their own local measurements, and sequentially craft small, minimally detectable perturbations that flip global outcomes (Luo et al., 2019). The LEB (Learning-Evaluation-Beating) approach is empirically shown to cause up to 82% decision-flipping even under state-of-the-art fusion center defenses. Countermeasures based on bounding the maximal group influence yield a sharp reduction in attack efficacy.

Attacks on influence spread in social networks leverage submodularity of influence reduction under linear threshold models; optimal strategies for limiting viral or rumor propagation are efficiently approximated by deleting nodes/edges to cover sampled “reverse-reachable” path sets, generalizing classical max-cover strategies (Sun et al., 2022).

Empirical results demonstrate that forward-backward and DAG-based VRR sampling can prune network connectivity to sharply reduce influence, with strong approximation guarantees and efficient scaling.

3. Theoretical Properties: Submodularity, NP-Hardness, Concentration

Across attack surfaces, the underlying attack objective—maximizing misclassification, recommendation inclusion, or viral suppression—is almost always shown to be monotone and submodular in seed or attacked set selection. This enables the application of greedy algorithms with (1–1/e) or (1/2–ε) approximation guarantees, per seminal results (Nemhauser et al., Fisher–Nemhauser–Wolsey).

When random thresholds are incorporated (as in GNN logit threshold modeling, (Ma et al., 2021)), expected attack gain remains submodular and nondecreasing under minimal assumptions on the cumulative distribution functions. Empirical misclassification rates concentrate around their means under weak dependency conditions (Hoeffding/Chernoff bounds), though full formalization under GNN-induced dependencies remains open.

Conversely, the underlying selection problem is nearly always NP-hard (reduction from Set Cover or related combinatorial problems). In some attribution-attack regimes, impossibility results identify instances fundamentally immune to influence attack regardless of model parameterization.

4. Empirical Results and Performance

Table: Representative Numerical Results from Influence-based Attacks

Domain	Attack Setup	Baseline	Influence-based Result	Absolute Gain
GNN (Cora)	r=1% of N, GCN	70.0%	68.8% (InfMax-Unif)	–1.2% acc.
Recommender (Music)	3% fake users, HR@10	0.0017	0.417 (S-TNA-Inf)	+0.4153
Summarization	30% poison (inversion)	~0%	~88–90% inversion	+88–90%
Sequential Rec.	NDCG@10 (tail, ML-1M)	≤0.073	0.274 (INFAttack)	3–20×
Memorization score	PINV attack	0.01	0.6	+0.59

Influence-based attacks consistently outperform random selection, centrality-based, or simple gradient-based heuristics under identical budgets, with statistically significant improvements. Notably, these attacks remain effective under partial knowledge, transfer well across architectures, and remain stealthy against standard anomaly or outlier-based defenses.

5. Limitations, Defenses, and Open Problems

Limitations of influence-based attacks include reliance on (a) accurate influence surrogates (which may differ from true margins or thresholds due to black-box or restricted white-box access), (b) assumptions about network or data structure (e.g., independence of logit thresholds, randomness/curvature smoothness), and (c) the need for a minimum attack budget for concentration effects to manifest.

Corresponding defenses exploit these same structures: margin-maximizing regularization in GNNs or neural models can make influence-based misclassification harder by increasing the denominator in threshold expressions, adversarial feature training can reduce sensitivity to low-budget perturbations, and explicit influence-limiting in aggregation settings caps the attackable spread. Differential privacy and formal stability mechanisms are required to robustify influence-based attributions and data valuation systems.

Open theoretical questions remain concerning the exact concentration of attack effects under global dependence, possible intrinsic robustness to group influence attack in highly assortative multiplex networks, and the extendability of efficient influence-based poisoning to non-convex deep model families.

6. Extensions and Broader Impacts

Influence-based attack methodologies extend to diverse domains: multi-layer and multiplex network robustness, sequence modeling, distributed sensor networks, reputation systems, social engineering via peer-bot cascades, and coordination detection in social media. In many such settings, influence-based attacks expose fundamental vulnerabilities in currently deployed systems, especially where attributions or recommendations are automated and unprotected, underscoring the importance of integrating influence-robust design in both model and system-level architectures.

In summary, influence-based attacks leverage deep theoretical connections between model/data/network sensitivity and tractable submodular optimization, yielding efficient and transferable attack algorithms. Their empirical effectiveness and generality demand a reevaluation of attribution, recommendation, and network-defense practices to anticipate and counteract adversarial influence.