Papers
Topics
Authors
Recent
Search
2000 character limit reached

Base-Model Drag Attacks (MPAF) in Federated Learning

Updated 24 April 2026
  • Base-Model Drag Attacks (MPAF) are a novel model poisoning method where fake clients send synchronized updates to steer the global model toward a preselected poor performing base model.
  • The attack leverages a fixed scaling factor on malicious updates, allowing even a small fraction of fake clients to override standard aggregation defenses like FedAvg, median, and trimmed-mean.
  • Empirical evaluations on datasets such as MNIST, Fashion-MNIST, and Purchase demonstrate significant accuracy drops, highlighting the need for robust defense mechanisms in federated learning.

Base-Model Drag Attacks (MPAF) are a class of model poisoning attacks against Federated Learning (FL) systems characterized by the introduction of fake clients whose sole objective is to steer the global model toward a fixed, attacker-chosen base model of poor accuracy. This attack paradigm fundamentally challenges the assumption that an adversary must corrupt a substantial fraction of genuine clients to exert significant influence. Instead, MPAF leverages carefully synchronized updates from a minority of fake clients to subvert standard and "Byzantine-robust" aggregation defenses, posing severe integrity risks to practical FL deployments (Cao et al., 2022).

1. Federated Learning Threat Model and Attack Setup

The primary scenario consists of nn genuine FL clients participating in the distributed learning of a model wRdw \in \mathbb{R}^d, while mm attacker-controlled fake clients are injected into the system. The server orchestrates synchronous training rounds t=0,1,...,T1t = 0, 1, ..., T-1, each round sampling a fraction β\beta of all clients (with β=1\beta=1 as default). Genuine clients compute local updates Δwit\Delta w_i^t via several local SGD steps; fake clients send arbitrary updates. The server aggregates all received updates using an aggregation rule A\mathcal{A} such as FedAvg (mean), coordinate-wise median, or trimmed-mean before applying a global step:

wt+1wt+ηgt,w^{t+1} \leftarrow w^t + \eta g^t,

where gtg^t is the aggregated update and wRdw \in \mathbb{R}^d0 is the learning rate. Attackers do not observe benign data, updates, wRdw \in \mathbb{R}^d1, wRdw \in \mathbb{R}^d2, or even which rule wRdw \in \mathbb{R}^d3 is used—they only receive the current global model in each round.

2. Attack Principle and Mathematical Formulation

The MPAF exploit centers on maintaining a persistent directional influence on wRdw \in \mathbb{R}^d4 by "dragging" it toward a fixed base model wRdw \in \mathbb{R}^d5 that the attacker selects upfront (e.g., a randomly initialized model with uniformly low classification accuracy). In each round, every fake client contributes an identical gradient update:

wRdw \in \mathbb{R}^d6

where wRdw \in \mathbb{R}^d7 is a scaling factor to ensure the magnitude is competitive with or dominates genuine client updates.

This construction operates purely in parameter space: the malicious updates always point toward wRdw \in \mathbb{R}^d8, regardless of the current model position, ensuring that over multiple rounds, if not neutralized, wRdw \in \mathbb{R}^d9 will progressively approach the base model.

The attacker’s optimality criterion can be viewed as greedily minimizing mm0, making each step a solution to the instantaneous subproblem of maximal drag toward mm1.

3. Aggregation, Defense Bypass, and Algorithmic Realization

Standard aggregation rules interact with this attack as follows:

  • FedAvg: The mean is directly susceptible, as even 1% fake clients with large mm2 suffice to overwhelm the aggregate and collapse global accuracy to random guessing.
  • Coordinate-wise Median & Trimmed-Mean: Each fake update is identical and aligned in parameter space, making it a consistent "outlier" across all affected coordinates; if mm3 is sufficiently large relative to mm4, even clipped or robust rules are pressured to accommodate the attacker's direction. When trimmed-mean is used with trimming parameter mm5, malicious updates persistently influence aggregated results when mm6 surpasses critical resilience thresholds.

The MPAF procedure can be formalized as:

MPAF Algorithm

  • Inputs: mm7, mm8, mm9, base model t=0,1,...,T1t = 0, 1, ..., T-10, scaling factor t=0,1,...,T1t = 0, 1, ..., T-11
  • For each round t=0,1,...,T1t = 0, 1, ..., T-12 to t=0,1,...,T1t = 0, 1, ..., T-13:
    • Server broadcasts t=0,1,...,T1t = 0, 1, ..., T-14 to selected clients.
    • Each genuine client t=0,1,...,T1t = 0, 1, ..., T-15 returns t=0,1,...,T1t = 0, 1, ..., T-16 via SGD; each fake client returns t=0,1,...,T1t = 0, 1, ..., T-17.
    • Server aggregates t=0,1,...,T1t = 0, 1, ..., T-18 using t=0,1,...,T1t = 0, 1, ..., T-19; updates β\beta0.

Empirically, β\beta1 is chosen so that β\beta2 matches large but plausible update magnitudes (e.g., β\beta3). The attack saturates for β\beta4, indicating insensitivity to exact learning rate or benign gradient scale (Cao et al., 2022).

4. Experimental Evaluation and Empirical Impact

Experiments on MNIST, Fashion-MNIST, and Purchase datasets reveal the potency of MPAF:

  • With FedAvg, as little as β\beta5 fake clients suffice to degrade test accuracy to random chance (e.g., 10% for a 10-class task).
  • With Median or Trimmed-mean defenders, β\beta6 induces β\beta7 accuracy drop (Purchase) and up to β\beta8 (MNIST), while baseline noise-injection attacks inflict β\beta9 drop.
  • As β=1\beta=10 increases to β=1\beta=11, degradation deepens (e.g., 32% → 49% on Purchase).
  • Reducing participation rate β=1\beta=12 down to β=1\beta=13 (1% clients per round) does not mitigate the attack: MPAF remains highly effective independent of client sampling strategies.
  • Tuning the scaling factor β=1\beta=14 above unity rapidly maximizes the attack’s effect; attack efficacy is robust to the choice of β=1\beta=15 and precise scale, as near-optimal performance is achieved for β=1\beta=16.
  • Norm clipping curbs outlier update norms but cannot prevent the attack unless the clipping threshold β=1\beta=17 is set so low that benign training quality also collapses. For practical β=1\beta=18 values, MPAF still degrades accuracy by β=1\beta=19.

These results highlight how standard defenses are structurally vulnerable to coordinated persistent poisoning via fake clients.

5. Limitations of Existing Defenses and Security Implications

The qualitative hallmark of MPAF is directional consistency—fake updates always point toward Δwit\Delta w_i^t0 and are perfectly synchronized. Classic robust aggregation and norm-clipping approaches are primarily designed for magnitude outliers or coordinate-wise extremal behavior; they are fundamentally limited in the presence of many clients persistently dragging the global update in the same direction.

The paper demonstrates that:

  • Median and trimmed-mean rules are not immune, as synchronized malicious updates avoid randomization and can pass robust filters if Δwit\Delta w_i^t1 is not vanishingly small.
  • Norm clipping is ineffective at realistic thresholds.

Table: Defense Impact Summary

Aggregation Rule Minimal Fake Fraction for Major Degradation Test Accuracy Drop
FedAvg Δwit\Delta w_i^t2 Down to random chance
Median, Trimmed-mean Δwit\Delta w_i^t3 Δwit\Delta w_i^t4-Δwit\Delta w_i^t5
Norm clipping (Δwit\Delta w_i^t6) Δwit\Delta w_i^t7-Δwit\Delta w_i^t8 at benign Δwit\Delta w_i^t9

6. Future Research and Open Directions

Proposed mitigations and research priorities include:

  • Development of aggregation or anomaly detection rules that analyze directional consistency across rounds, not just per-round magnitude outliers.
  • Extension of MPAF to targeted poisoning (e.g., choosing A\mathcal{A}0 to encode a backdoor or trigger-specific behavior).
  • Use of side-channel or auxiliary public data to validate global model direction and certify resistance to base-model drag.
  • Exploration of formally provable defenses, such as ensemble methods with redundant computation or voting, and systematic resilience checks for fake-client influence.

The paper underscores that adversaries need not compromise genuine clients to achieve catastrophic sabotage in FL; fake-client injection with persistent, strategically aligned parameter updates can undermine standard defense assumptions, motivating a fundamental reevaluation of federated system security (Cao et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Base-Model Drag Attacks (MPAF).