Base-Model Drag Attacks (MPAF) in Federated Learning
- Base-Model Drag Attacks (MPAF) are a novel model poisoning method where fake clients send synchronized updates to steer the global model toward a preselected poor performing base model.
- The attack leverages a fixed scaling factor on malicious updates, allowing even a small fraction of fake clients to override standard aggregation defenses like FedAvg, median, and trimmed-mean.
- Empirical evaluations on datasets such as MNIST, Fashion-MNIST, and Purchase demonstrate significant accuracy drops, highlighting the need for robust defense mechanisms in federated learning.
Base-Model Drag Attacks (MPAF) are a class of model poisoning attacks against Federated Learning (FL) systems characterized by the introduction of fake clients whose sole objective is to steer the global model toward a fixed, attacker-chosen base model of poor accuracy. This attack paradigm fundamentally challenges the assumption that an adversary must corrupt a substantial fraction of genuine clients to exert significant influence. Instead, MPAF leverages carefully synchronized updates from a minority of fake clients to subvert standard and "Byzantine-robust" aggregation defenses, posing severe integrity risks to practical FL deployments (Cao et al., 2022).
1. Federated Learning Threat Model and Attack Setup
The primary scenario consists of genuine FL clients participating in the distributed learning of a model , while attacker-controlled fake clients are injected into the system. The server orchestrates synchronous training rounds , each round sampling a fraction of all clients (with as default). Genuine clients compute local updates via several local SGD steps; fake clients send arbitrary updates. The server aggregates all received updates using an aggregation rule such as FedAvg (mean), coordinate-wise median, or trimmed-mean before applying a global step:
where is the aggregated update and 0 is the learning rate. Attackers do not observe benign data, updates, 1, 2, or even which rule 3 is used—they only receive the current global model in each round.
2. Attack Principle and Mathematical Formulation
The MPAF exploit centers on maintaining a persistent directional influence on 4 by "dragging" it toward a fixed base model 5 that the attacker selects upfront (e.g., a randomly initialized model with uniformly low classification accuracy). In each round, every fake client contributes an identical gradient update:
6
where 7 is a scaling factor to ensure the magnitude is competitive with or dominates genuine client updates.
This construction operates purely in parameter space: the malicious updates always point toward 8, regardless of the current model position, ensuring that over multiple rounds, if not neutralized, 9 will progressively approach the base model.
The attacker’s optimality criterion can be viewed as greedily minimizing 0, making each step a solution to the instantaneous subproblem of maximal drag toward 1.
3. Aggregation, Defense Bypass, and Algorithmic Realization
Standard aggregation rules interact with this attack as follows:
- FedAvg: The mean is directly susceptible, as even 1% fake clients with large 2 suffice to overwhelm the aggregate and collapse global accuracy to random guessing.
- Coordinate-wise Median & Trimmed-Mean: Each fake update is identical and aligned in parameter space, making it a consistent "outlier" across all affected coordinates; if 3 is sufficiently large relative to 4, even clipped or robust rules are pressured to accommodate the attacker's direction. When trimmed-mean is used with trimming parameter 5, malicious updates persistently influence aggregated results when 6 surpasses critical resilience thresholds.
The MPAF procedure can be formalized as:
MPAF Algorithm
- Inputs: 7, 8, 9, base model 0, scaling factor 1
- For each round 2 to 3:
- Server broadcasts 4 to selected clients.
- Each genuine client 5 returns 6 via SGD; each fake client returns 7.
- Server aggregates 8 using 9; updates 0.
Empirically, 1 is chosen so that 2 matches large but plausible update magnitudes (e.g., 3). The attack saturates for 4, indicating insensitivity to exact learning rate or benign gradient scale (Cao et al., 2022).
4. Experimental Evaluation and Empirical Impact
Experiments on MNIST, Fashion-MNIST, and Purchase datasets reveal the potency of MPAF:
- With FedAvg, as little as 5 fake clients suffice to degrade test accuracy to random chance (e.g., 10% for a 10-class task).
- With Median or Trimmed-mean defenders, 6 induces 7 accuracy drop (Purchase) and up to 8 (MNIST), while baseline noise-injection attacks inflict 9 drop.
- As 0 increases to 1, degradation deepens (e.g., 32% → 49% on Purchase).
- Reducing participation rate 2 down to 3 (1% clients per round) does not mitigate the attack: MPAF remains highly effective independent of client sampling strategies.
- Tuning the scaling factor 4 above unity rapidly maximizes the attack’s effect; attack efficacy is robust to the choice of 5 and precise scale, as near-optimal performance is achieved for 6.
- Norm clipping curbs outlier update norms but cannot prevent the attack unless the clipping threshold 7 is set so low that benign training quality also collapses. For practical 8 values, MPAF still degrades accuracy by 9.
These results highlight how standard defenses are structurally vulnerable to coordinated persistent poisoning via fake clients.
5. Limitations of Existing Defenses and Security Implications
The qualitative hallmark of MPAF is directional consistency—fake updates always point toward 0 and are perfectly synchronized. Classic robust aggregation and norm-clipping approaches are primarily designed for magnitude outliers or coordinate-wise extremal behavior; they are fundamentally limited in the presence of many clients persistently dragging the global update in the same direction.
The paper demonstrates that:
- Median and trimmed-mean rules are not immune, as synchronized malicious updates avoid randomization and can pass robust filters if 1 is not vanishingly small.
- Norm clipping is ineffective at realistic thresholds.
Table: Defense Impact Summary
| Aggregation Rule | Minimal Fake Fraction for Major Degradation | Test Accuracy Drop |
|---|---|---|
| FedAvg | 2 | Down to random chance |
| Median, Trimmed-mean | 3 | 4-5 |
| Norm clipping (6) | — | 7-8 at benign 9 |
6. Future Research and Open Directions
Proposed mitigations and research priorities include:
- Development of aggregation or anomaly detection rules that analyze directional consistency across rounds, not just per-round magnitude outliers.
- Extension of MPAF to targeted poisoning (e.g., choosing 0 to encode a backdoor or trigger-specific behavior).
- Use of side-channel or auxiliary public data to validate global model direction and certify resistance to base-model drag.
- Exploration of formally provable defenses, such as ensemble methods with redundant computation or voting, and systematic resilience checks for fake-client influence.
The paper underscores that adversaries need not compromise genuine clients to achieve catastrophic sabotage in FL; fake-client injection with persistent, strategically aligned parameter updates can undermine standard defense assumptions, motivating a fundamental reevaluation of federated system security (Cao et al., 2022).