Papers
Topics
Authors
Recent
2000 character limit reached

Gradient Assembly Poisoning (GAP)

Updated 9 January 2026
  • Gradient Assembly Poisoning (GAP) is a family of targeted attacks that stealthily aligns aggregated model gradients with adversarial objectives to trigger misclassification or semantic drift.
  • GAP exploits both data-space methods, like clean-label gradient matching, and parameter-space approaches using LoRA in federated settings to effectively corrupt large-scale models.
  • Empirical results demonstrate that GAP achieves high misclassification rates (up to nearly 100%) and severe performance degradation while evading standard anomaly and sanitization defenses.

Gradient Assembly Poisoning (GAP) refers to a family of targeted, parameter-space poisoning attacks that subvert learning algorithms by steering model parameter updates through the careful selection of input manipulations or model updates, so that the aggregated effect matches an adversary’s objective gradient. GAP attacks are realized in both data-space, such as training data poison crafted to align aggregate gradients with test-time adversarial directions, and in parameter-space, such as malicious assembly of low-rank adaptation matrices in federated learning. GAP enables highly effective, stealthy corruption of modern large-scale neural models—including LLMs—often evading standard anomaly or data sanitization defenses.

1. Background and Motivation

Gradient Assembly Poisoning has two complementary instantiations. The original formulation targets data-space poisoning in deep supervised learning settings, such as image classifiers trained from scratch on clean-label, limited-budget poisoned data. The recent parameter-space variant is tailored to federated/distributed fine-tuning of LLMs with Low-Rank Adaptation (LoRA).

The common thread is the exploitation of the directed nature of model parameter updates under SGD or FedAvg. By ensuring that the cumulative update—induced by a (potentially small) set of poison inputs or parameter blocks—aligns with particular gradients, the adversary can reliably precipitate targeted misbehavior (such as test-time misclassification, semantic drift, or subversive content generation) (Dong et al., 2 Jan 2026, &&&1&&&).

2. GAP in Data-Space: Clean-Label, Gradient-Matching Poison

In the context of data poisoning, GAP proceeds as follows (Geiping et al., 2020):

  • Threat Model: The attacker can perturb a small fraction PP of training samples (xi,yi)(x_i, y_i), but does not alter their labels (“clean-label”); the rest remain untouched. The model is trained from scratch.
  • Poison Objective: For a fixed target (xt,yadv)(x^t, y^{adv}), select perturbations Δi\Delta_i for iIpi \in I_p (the poisoned set) under a per-sample \ell_\infty-norm bound Δiϵ\|\Delta_i\|_\infty \leq \epsilon so that the model fθf_\theta trained on the full set will misclassify xtx^t as yadvy^{adv}.
  • Gradient Alignment: Rather than unrolled bilevel optimization, the core technical mechanism is to force the aggregated training gradient of the poisoned points, gp(Δ)=iIpθ(fθ(xi+Δi),yi)g^p(\Delta) = \sum_{i\in I_p} \nabla_\theta \ell(f_\theta(x_i+\Delta_i), y_i), to align with the desired adversarial test gradient gt=θ(fθ(xt),yadv)g^t = \nabla_\theta \ell(f_\theta(x^t), y^{adv}). Formally, the optimization is:

minΔ:ΔϵB(Δ;θ)=1gt,gp(Δ)gt2gp(Δ)2\min_{\Delta:\|\Delta\|_\infty \leq \epsilon} \mathcal{B}(\Delta;\theta) = 1 - \frac{\langle g^t, g^p(\Delta) \rangle}{\|g^t\|_2 \|g^p(\Delta)\|_2}

where B\mathcal{B} is the cosine-similarity loss.

  • Outcome: This alignment ensures, by Zoutendijk’s theorem, monotonic descent of the adversarial loss and target misclassification with very high reliability—even for large models and industrial-scale datasets. For ImageNet/ResNet, this achieves 80–100% success rates with only 0.1–1% poisoned points.
  • Stealth: Poison examples are nearly indistinguishable (by all tested sanitizers) from valid training data, as their feature statistics remain deeply in-distribution (Geiping et al., 2020).

3. GAP in Parameter-Space: LoRA-based Federated/Distributed Attacks

In distributed fine-tuning with LoRA, GAP exploits protocol weaknesses where LoRA parameters ARd×rA \in \mathbb{R}^{d \times r} and BRr×kB \in \mathbb{R}^{r \times k} are separately submitted and validated in federated aggregation (Dong et al., 2 Jan 2026).

  • LoRA Protocol: Pretrained weight WW is frozen, with updates ΔW=AB\Delta W = AB. Federated clients send (Ai,Bi)(A_i, B_i), which are individually validated (norm/distance checks) and averaged: Aˉ\bar{A}, Bˉ\bar{B}, update WW+AˉBˉW \leftarrow W + \bar{A}\bar{B}.
  • Vulnerability: The product AiBiA_i B_i is never directly validated; thus, individually benign Ai,BiA_i, B_i can be constructed such that AiBiA_i B_i implements a targeted semantic corruption.
  • Threat Model: The adversary controls a quota α\alpha of clients, cannot access others’ data, and follows all matrix submission rules, mimicking the temporal and spatial dynamics of benign updates.
  • Four Systemic Vulnerabilities:
  1. Verification Gaps: Independent A,BA, B checks miss malicious ABAB.
  2. Layer-wise Isolation: No cross-layer or global consistency checks.
  3. Bias Accumulation: Low-rank constraint enables subtle malicious drift to accrue undetected across many rounds.
  4. Parameter-Behavior Mismatch: Normal A,BA,B statistics can mask malicious semantics when recomposed.
  • Attack Formulation: The adversary first obtains target adapter matrices (Atarget,Btarget)(A_\text{target}, B_\text{target}) by adversarial fine-tuning on “poison” data, then in each round solves a convex projection minimizing:

E[Aˉ(t)Ai]Atarget2+E[Bˉ(t)Bi]Btarget2\| E[\bar{A}^{(t)}|A_i] - A_\text{target} \|_2 + \| E[\bar{B}^{(t)}|B_i] - B_\text{target} \|_2

subject to temporal and spatial constraints inferred from historical benign updates.

  • Effectiveness: GAP achieves BLEU reductions up to 14.5%, ≥800% increase in grammatical and factual errors, while preserving 92.6% response length—ensuring fluency and evading surface-level detectors.

4. Algorithmic Procedures and Practical Considerations

Data Poison GAP

  • Procedure: Two nested loops: (a) restarts; (b) signed-Adam gradient steps optimizing cosine similarity between gtg^t and gp(Δ)g^p(\Delta), with projection onto the allowed perturbation ball.
  • Scalability: Each step updates only PNP \ll N samples; computation is tractable even for ImageNet.

LoRA-Fed GAP

  • Procedure:
  1. Offline: Adversarial LoRA fine-tuning to yield (Atarget,Btarget)(A_\text{target}, B_\text{target}).
  2. Per round: Projected update computation to steer aggregated (Aˉ,Bˉ)(\bar{A}, \bar{B}) toward target matrices, within norm bounds.
  • Closed-Form Solution: Convex projection (Euclidean onto 2\ell_2-balls) with negligible additional runtime over local fine-tuning (Dong et al., 2 Jan 2026).

5. Empirical Results and Attack Impact

Data-Space GAP

  • CIFAR-10: 1% poison, =16\ell_\infty=16 yields ∼90% target compromise on ResNet-18 in 30 minutes (1 GPU); prior methods (MetaPoison, feature-collision) achieve at most 55% at vastly greater cost. Across 100 “Just-Poison-1%” benchmarks, achieves 45–55% success.
  • ImageNet: 0.1% poison, =8\ell_\infty=8 yields ∼80%; at =16\ell_\infty=16 nearly 100%. Success generalizes to varied models (ResNet-34, VGG-16, MobileNet-V2) (Geiping et al., 2020).
  • Multi-target: Dividing poison budget yields ∼36% per-target compromise at T=4T=4 targets.

LoRA-based GAP

  • Models: LLaMA-7B/13B/33B, ChatGLM-6B, GPT-2.
  • Metrics:
    • BLEU drops by up to 14.5%
    • Factual/grammar error increases up to 800%; ChatGLM-6B perplexity 21.3→70.1, grammar errors 3.9→13.3
    • Long-form fluency unchanged (92.6% response preservation)
  • Topic sensitivity: Factual QA errors up to 833% (LLaMA-7B); subjective content errors up to 568% (LLaMA-13B).
  • Efficiency: GAP converges in half as many rounds as traditional data poisoning; per-round GPU cost 106\ll 10^{-6} GPU-hours, total cost 35–40% below data-poisoning (Dong et al., 2 Jan 2026).

6. Stealth and Evasion of Defenses

Standard anomaly detection and data sanitization approaches, including:

  • 2\ell_2-norm thresholding,
  • reputation-based defense (FoolsGold),
  • subspace anomaly detection (Spectral Signatures), fail to detect GAP-crafted updates in practice. Detection rates are 0% (norm), 1.2% (FoolsGold), and 0% (spectral); evasion rates near 100% in all cases.

Geometric analysis (e.g., UMAP projections) shows malicious LoRA components from GAP attackers remain interleaved with benign updates across all layers and rounds.

In the data-poisoning regime, poisoned points are as in-distribution as clean data and evade state-of-the-art sanitization. Differentially private SGD and adversarial training reduce poisoning effectiveness only at the cost of substantial accuracy degradation (≥20%) (Geiping et al., 2020).

7. Implications, Open Problems, and Potential Defenses

GAP reveals a fundamental mismatch between protocol-level efficiency (via independently processed low-rank matrix updates in LoRA/federation or data-point-level manipulation in gradient-matching poisoning) and robust aggregation. The multiplicative effect of LoRA’s ABAB update, or aggregated gradient assembly in poison crafting, creates an irreducible blind spot for existing monitoring schemes.

Potential defenses include:

  • Composition Monitoring: Explicitly compute and verify AiBiA_i B_i (or their projections) before accepting LoRA updates, at increased computational cost.
  • Adaptive Verification: Deploy layer-specific anomaly detectors focused on more vulnerable components, adjusting thresholds dynamically as needed.

This suggests that future federated and distributed learning schemes must incorporate joint verification of composite updates and model semantics, rather than relying solely on per-parameter or per-component checks.

Unresolved challenges include the design of scalable, practical defenses that can distinguish gradient-assembly-based malicious updates from benign ones without incurring prohibitive computational or accuracy cost, and the extension of such attacks and defenses to more complex modalities and tasks (Dong et al., 2 Jan 2026, Geiping et al., 2020).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Gradient Assembly Poisoning (GAP).