Base-Model Drag Attacks (MPAF) in Federated Learning

Updated 24 April 2026

Base-Model Drag Attacks (MPAF) are a novel model poisoning method where fake clients send synchronized updates to steer the global model toward a preselected poor performing base model.
The attack leverages a fixed scaling factor on malicious updates, allowing even a small fraction of fake clients to override standard aggregation defenses like FedAvg, median, and trimmed-mean.
Empirical evaluations on datasets such as MNIST, Fashion-MNIST, and Purchase demonstrate significant accuracy drops, highlighting the need for robust defense mechanisms in federated learning.

Base-Model Drag Attacks (MPAF) are a class of model poisoning attacks against Federated Learning (FL) systems characterized by the introduction of fake clients whose sole objective is to steer the global model toward a fixed, attacker-chosen base model of poor accuracy. This attack paradigm fundamentally challenges the assumption that an adversary must corrupt a substantial fraction of genuine clients to exert significant influence. Instead, MPAF leverages carefully synchronized updates from a minority of fake clients to subvert standard and "Byzantine-robust" aggregation defenses, posing severe integrity risks to practical FL deployments (Cao et al., 2022).

1. Federated Learning Threat Model and Attack Setup

The primary scenario consists of $n$ genuine FL clients participating in the distributed learning of a model $w \in \mathbb{R}^d$ , while $m$ attacker-controlled fake clients are injected into the system. The server orchestrates synchronous training rounds $t = 0, 1, ..., T-1$ , each round sampling a fraction $\beta$ of all clients (with $\beta=1$ as default). Genuine clients compute local updates $\Delta w_i^t$ via several local SGD steps; fake clients send arbitrary updates. The server aggregates all received updates using an aggregation rule $\mathcal{A}$ such as FedAvg (mean), coordinate-wise median, or trimmed-mean before applying a global step:

$w^{t+1} \leftarrow w^t + \eta g^t,$

where $g^t$ is the aggregated update and $w \in \mathbb{R}^d$ 0 is the learning rate. Attackers do not observe benign data, updates, $w \in \mathbb{R}^d$ 1, $w \in \mathbb{R}^d$ 2, or even which rule $w \in \mathbb{R}^d$ 3 is used—they only receive the current global model in each round.

2. Attack Principle and Mathematical Formulation

The MPAF exploit centers on maintaining a persistent directional influence on $w \in \mathbb{R}^d$ 4 by "dragging" it toward a fixed base model $w \in \mathbb{R}^d$ 5 that the attacker selects upfront (e.g., a randomly initialized model with uniformly low classification accuracy). In each round, every fake client contributes an identical gradient update:

$w \in \mathbb{R}^d$ 6

where $w \in \mathbb{R}^d$ 7 is a scaling factor to ensure the magnitude is competitive with or dominates genuine client updates.

This construction operates purely in parameter space: the malicious updates always point toward $w \in \mathbb{R}^d$ 8, regardless of the current model position, ensuring that over multiple rounds, if not neutralized, $w \in \mathbb{R}^d$ 9 will progressively approach the base model.

The attacker’s optimality criterion can be viewed as greedily minimizing $m$ 0, making each step a solution to the instantaneous subproblem of maximal drag toward $m$ 1.

3. Aggregation, Defense Bypass, and Algorithmic Realization

Standard aggregation rules interact with this attack as follows:

FedAvg: The mean is directly susceptible, as even 1% fake clients with large $m$ 2 suffice to overwhelm the aggregate and collapse global accuracy to random guessing.
Coordinate-wise Median & Trimmed-Mean: Each fake update is identical and aligned in parameter space, making it a consistent "outlier" across all affected coordinates; if $m$ 3 is sufficiently large relative to $m$ 4, even clipped or robust rules are pressured to accommodate the attacker's direction. When trimmed-mean is used with trimming parameter $m$ 5, malicious updates persistently influence aggregated results when $m$ 6 surpasses critical resilience thresholds.

The MPAF procedure can be formalized as:

MPAF Algorithm

Inputs: $m$ 7, $m$ 8, $m$ 9, base model $t = 0, 1, ..., T-1$ 0, scaling factor $t = 0, 1, ..., T-1$ 1
For each round $t = 0, 1, ..., T-1$ 2 to $t = 0, 1, ..., T-1$ 3:
- Server broadcasts $t = 0, 1, ..., T-1$ 4 to selected clients.
- Each genuine client $t = 0, 1, ..., T-1$ 5 returns $t = 0, 1, ..., T-1$ 6 via SGD; each fake client returns $t = 0, 1, ..., T-1$ 7.
- Server aggregates $t = 0, 1, ..., T-1$ 8 using $t = 0, 1, ..., T-1$ 9; updates $\beta$ 0.

Empirically, $\beta$ 1 is chosen so that $\beta$ 2 matches large but plausible update magnitudes (e.g., $\beta$ 3). The attack saturates for $\beta$ 4, indicating insensitivity to exact learning rate or benign gradient scale (Cao et al., 2022).

4. Experimental Evaluation and Empirical Impact

Experiments on MNIST, Fashion-MNIST, and Purchase datasets reveal the potency of MPAF:

With FedAvg, as little as $\beta$ 5 fake clients suffice to degrade test accuracy to random chance (e.g., 10% for a 10-class task).
With Median or Trimmed-mean defenders, $\beta$ 6 induces $\beta$ 7 accuracy drop (Purchase) and up to $\beta$ 8 (MNIST), while baseline noise-injection attacks inflict $\beta$ 9 drop.
As $\beta=1$ 0 increases to $\beta=1$ 1, degradation deepens (e.g., 32% → 49% on Purchase).
Reducing participation rate $\beta=1$ 2 down to $\beta=1$ 3 (1% clients per round) does not mitigate the attack: MPAF remains highly effective independent of client sampling strategies.
Tuning the scaling factor $\beta=1$ 4 above unity rapidly maximizes the attack’s effect; attack efficacy is robust to the choice of $\beta=1$ 5 and precise scale, as near-optimal performance is achieved for $\beta=1$ 6.
Norm clipping curbs outlier update norms but cannot prevent the attack unless the clipping threshold $\beta=1$ 7 is set so low that benign training quality also collapses. For practical $\beta=1$ 8 values, MPAF still degrades accuracy by $\beta=1$ 9.

These results highlight how standard defenses are structurally vulnerable to coordinated persistent poisoning via fake clients.

5. Limitations of Existing Defenses and Security Implications

The qualitative hallmark of MPAF is directional consistency—fake updates always point toward $\Delta w_i^t$ 0 and are perfectly synchronized. Classic robust aggregation and norm-clipping approaches are primarily designed for magnitude outliers or coordinate-wise extremal behavior; they are fundamentally limited in the presence of many clients persistently dragging the global update in the same direction.

The paper demonstrates that:

Median and trimmed-mean rules are not immune, as synchronized malicious updates avoid randomization and can pass robust filters if $\Delta w_i^t$ 1 is not vanishingly small.
Norm clipping is ineffective at realistic thresholds.

Table: Defense Impact Summary

Aggregation Rule	Minimal Fake Fraction for Major Degradation	Test Accuracy Drop
FedAvg	$\Delta w_i^t$ 2	Down to random chance
Median, Trimmed-mean	$\Delta w_i^t$ 3	$\Delta w_i^t$ 4- $\Delta w_i^t$ 5
Norm clipping ( $\Delta w_i^t$ 6)	—	$\Delta w_i^t$ 7- $\Delta w_i^t$ 8 at benign $\Delta w_i^t$ 9

6. Future Research and Open Directions

Proposed mitigations and research priorities include:

Development of aggregation or anomaly detection rules that analyze directional consistency across rounds, not just per-round magnitude outliers.
Extension of MPAF to targeted poisoning (e.g., choosing $\mathcal{A}$ 0 to encode a backdoor or trigger-specific behavior).
Use of side-channel or auxiliary public data to validate global model direction and certify resistance to base-model drag.
Exploration of formally provable defenses, such as ensemble methods with redundant computation or voting, and systematic resilience checks for fake-client influence.

The paper underscores that adversaries need not compromise genuine clients to achieve catastrophic sabotage in FL; fake-client injection with persistent, strategically aligned parameter updates can undermine standard defense assumptions, motivating a fundamental reevaluation of federated system security (Cao et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

MPAF: Model Poisoning Attacks to Federated Learning based on Fake Clients (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Base-Model Drag Attacks (MPAF).

Base-Model Drag Attacks (MPAF) in Federated Learning

1. Federated Learning Threat Model and Attack Setup

2. Attack Principle and Mathematical Formulation

3. Aggregation, Defense Bypass, and Algorithmic Realization

4. Experimental Evaluation and Empirical Impact

5. Limitations of Existing Defenses and Security Implications

6. Future Research and Open Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Base-Model Drag Attacks (MPAF) in Federated Learning

1. Federated Learning Threat Model and Attack Setup

2. Attack Principle and Mathematical Formulation

3. Aggregation, Defense Bypass, and Algorithmic Realization

4. Experimental Evaluation and Empirical Impact

5. Limitations of Existing Defenses and Security Implications

6. Future Research and Open Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research