FedPoisonTTP: Threats in Federated Learning

Updated 1 December 2025

FedPoisonTTP is a threat model in federated learning that includes poisoning attacks by privileged adversaries targeting both training and test-time personalization.
It employs methods like label-flipping, feature-poisoning, and orchestrator manipulation to degrade model performance while evading conventional anomaly detection.
Detection frameworks such as label-flip diagnostics and client-side fingerprinting offer practical mitigation strategies against orchestrator-driven targeted overfitting.

FedPoisonTTP designates a family of threat models, attack strategies, and corresponding detection frameworks for data poisoning in federated learning (FL) systems, focusing on privileged adversaries (“trusted third parties,” TTPs), poisoned test-time personalization, and robust client-side overfitting detection. The term encompasses (i) poisoning strategies that degrade or hijack the global or per-client model during standard or adaptive FL, (ii) methods for synthesizing stealthy, in-distribution poisons using adversarial optimization, and (iii) detection frameworks—particularly for identifying orchestrator-driven targeted overfitting. This landscape includes both test-time and training-time settings and extends to the evaluation and mitigation of such threats across diverse model architectures and adaptation protocols (Iftee et al., 24 Nov 2025, Mestari et al., 15 Sep 2025, Nowroozi et al., 5 Mar 2024).

1. Threat Models and Adversary Capabilities

Three principal threat models underpin recent FedPoisonTTP research:

Byzantine Orchestrator (Targeted Overfitting): The orchestrator (server) selectively aggregates and distributes model updates so that a designated subset of clients $\mathcal{T}$ receive models that overfit their own local data while rendering global aggregation seemingly benign for other clients (Mestari et al., 15 Sep 2025). The orchestrator operates as “dishonest but trusted,” with full knowledge of aggregation and control over the update routing.
Grey-Box Test-Time Personalization Attacker: A compromised client (or small coalition) participates in federated test-time adaptation (FTTA), only observing broadcast model parameters $\{\theta^t\}$ . The adversary injects a small, stealthy fraction of crafted poison samples into its local test-time adaptation stream, constrained by per-sample norm bounds, label balance, and update-norm clipping to evade anomaly detection (Iftee et al., 24 Nov 2025).
White-Box Training-Time Poisoner: An attacker fully controls the data of a fraction $p$ of clients, leveraging knowledge of the model architecture, weights, and data distribution to corrupt local datasets either via label flipping or feature poisoning (Nowroozi et al., 5 Mar 2024).

In each setting, the adversary’s aim is to degrade global and/or targeted local model performance while evading common anomaly detection or outlier filtering mechanisms.

2. Attack Mechanisms

(A) Training-Time Data Poisoning

Label-Flipping (LF): For a client data batch $D = \{(x_j, y_j)\}_{j=1}^{N}$ , an $\alpha$ -fraction of samples is randomly selected; each label is inverted (binary flip or modular increment for multiclass). Following local training on $D_{poisoned}$ , the resulting model update, influenced by corrupted labels, is submitted to the aggregator (Nowroozi et al., 5 Mar 2024).
Feature-Poisoning (FP): A Random Forest classifier is trained on clean data to compute permutation-based feature importances. The top- $k$ important features are perturbed by replacing a fraction $p_f$ of their values with the normalized class-mean of the opposite class. This is performed by

$x_j[f] \gets \frac{\mu_{1-y_j} - \min_{f'}}{\max_{f'} - \min_{f'}}$

for $f$ in $F_{top}$ and appropriate class means, then yielding $D_{poisoned}$ for local model update (Nowroozi et al., 5 Mar 2024).

(B) Test-Time Personalization Poisoning

Surrogate Aggregator Distillation: Since the attacker does not observe other clients’ updates, a lightweight aggregator is distilled from historical global model sequences. Approximate contributions from non-attacker clients are estimated via moving average differences:

$\widehat{\Delta\theta}_{hist} = \frac{1}{k} \sum_{j=t-k+1}^t (\theta^j - \theta^{j-1})$

The attacker simulates post-aggregation outcomes for various candidate poisoned updates (Iftee et al., 24 Nov 2025).

Feature-Consistent In-Distribution Poison Synthesis: Poisoned samples are constrained to match per-layer feature moments (mean and variance) of a benign reference pool, minimizing

$\mathcal{L}_{reg} = \frac{1}{L} \sum_{\ell=1}^L \|\mu^\ell(\mathcal{D}_{poison}, \theta) - \mu^\ell(\mathcal{B}_{benign}, \theta)\|_2^2 + \beta \|\sigma^\ell(\mathcal{D}_{poison}, \theta) - \sigma^\ell(\mathcal{B}_{benign}, \theta)\|_2^2$

This regularization, coupled with perturbation norm constraints and class balancing, permits stealthy transfer across heterogeneous clients and helps poisons evade common outlier and norm-based detection (Iftee et al., 24 Nov 2025).

Attack Objectives:
- BN-Shift: Directly manipulates BatchNorm channel statistics during test-time adaptation.
- Notch High-Entropy (NHE): Encourages high model output entropy, except for the true class, maximizing confusion.
- Balanced Low-Entropy (BLE): Yields confident misclassifications uniformly across all but the true class, maintained through an auxiliary confidence penalty (Iftee et al., 24 Nov 2025).

(C) Targeted Overfitting via Aggregation Bias

The dishonestly-trusted orchestrator, at each round, computes two model aggregates: one over all participants for non-target clients and one over only the target subset $\mathcal{S}_{mal}^t$ for the victims. Over multiple rounds, this selective amplification drives $\mathcal{S}_{mal}^t$ models to overfit their own data, elevating their vulnerability to privacy attacks (Mestari et al., 15 Sep 2025).

3. Detection and Diagnostics

Detection of FedPoisonTTP attacks, especially orchestrator-driven targeted overfitting, is addressed by three optimizer-agnostic, client-side methods (Mestari et al., 15 Sep 2025):

Label-Flipping Poison Detection: Clients flip a small fraction $\alpha$ of labels in low-frequency classes and train locally. If aggregation is honest, the aggregated model degrades on the poisoned set, but if selective aggregation is in effect, accuracy remains high. The poison effectiveness score

$\mathrm{PES} = \mathrm{Acc}(M_{local}, \mathcal{D}_{flip}) - \mathrm{Acc}(M_{agg}, \mathcal{D}_{flip})$

reveals targeted overfitting when below a threshold.

Backdoor Trigger Detection: Clients train on data with backdoor triggers and measure whether trigger-label mapping persists post-aggregation. The mean accuracy on the triggered subset above a threshold suggests server favoritism.
Fingerprinting (Gradient/Weight): A secret fingerprint vector $s$ is added to the client’s local gradient or update. After aggregation, a dot product or cosine similarity with $s$ reveals non-dilution and thus targeted model favoritism. This is immediate and low-cost for single-client attacks but unreliable for multi-target scenarios.

These detection strategies can be calibrated for false-positive control and can prompt clients to opt out or alert a monitoring module upon suspicious anomalies.

4. Experimental Results and Evaluations

Test-Time Poisoning (FedPoisonTTP)

On CIFAR-10-C and CIFAR-100-C (using WideResNet-28 and ResNeXt-29 backbones, respectively), test-time adaptation with Tent or CoTTA and $M=5$ malicious clients out of $N=10$ yielded accuracy drops up to $17.3\%$ (CIFAR-10-C) and $29.9\%$ (CIFAR-100-C) under strong poison ratios. Smaller adaptation batch sizes amplified attack effect (Iftee et al., 24 Nov 2025).
The most effective objectives combined high-entropy/distribution-regularized or balanced-confident poisons with feature-consistency, evading update-norm anomaly detectors while achieving broad pollution of adaptation protocols.

Orchestrator-Driven Targeted Overfitting

Across MNIST, CIFAR-10, CIFAR-100, PathMNIST, and EuroSAT, single-target detection via label-flip or fingerprinting consistently attained $>95\%$ accuracy and round-1 latency (ResNet-18 tasks), while multi-target detection favored label-flip over fingerprinting (which collapses) (Mestari et al., 15 Sep 2025).
False positive rates and computational overheads vary: fingerprinting is $O(d)$ per round, whereas label-flip and backdoor require additional local retraining and small validation set maintenance.

Traditional Data Poisoning in FL

On CIC and UNSW datasets, feature-poisoning consistently degraded accuracy and attack success rate stealthily ( $\approx96\%$ at $1\%$ poisoning), while label-flipping caused abrupt, easily-detectable accuracy drops (e.g., CIC server accuracy $4.28\%$ at $1\%$ LF, attack success rate $95.64\%$ ) (Nowroozi et al., 5 Mar 2024).

5. Mitigation and Defense Strategies

Mitigations for FedPoisonTTP attacks are domain- and threat-model-specific:

Robust Aggregation: Defenses such as Krum, Median, and Trimmed Mean aggregation can reject statistical outliers in updates but may be insufficient when poisons are stealthily in-distribution or norm-bounded (Nowroozi et al., 5 Mar 2024).
Update Norm-Bounding: Enforcing $\|\Delta W^i\|_2 \leq \tau$ for all client updates limits the scale of perturbations but not their adaptability to the aggregation scheme (Iftee et al., 24 Nov 2025, Nowroozi et al., 5 Mar 2024).
Input/Feature-Level Monitoring: Detecting outlier statistics in feature moments or BN channel means/variances at test time can provide anomaly signals for poisoned adaptation sequences (Iftee et al., 24 Nov 2025).
Client-Side Validation/Hybrid Validation: Both client- and server-side holdout sets can cross-check post-aggregation performance, flagging significant accuracy or feature statistic deviations (Nowroozi et al., 5 Mar 2024, Iftee et al., 24 Nov 2025).
Differential Privacy: Randomization at the aggregation stage (via DP mechanisms) can limit per-client influence, especially in test-time adaptation (Nowroozi et al., 5 Mar 2024, Iftee et al., 24 Nov 2025).
Collaborative/Hybrid Defenses: Periodic inter-client validation or randomized shuffling/clipping in aggregation further distributes trust, especially relevant against orchestrator-centric threats (Mestari et al., 15 Sep 2025).

6. Significance and Open Challenges

FedPoisonTTP schemes demonstrate the importance of considering both sophisticated adversarial models (including privileged orchestrators and adaptive, stealthy participants) and the nuances of modern FL adaptations (e.g., test-time personalization) in threat modeling. These attack frameworks highlight the shortcomings of standard robust aggregation and anomaly detection, given coordinated or distribution-matched poisons.

Remaining challenges include:

Detecting adaptive multi-client attacks in the presence of heterogeneous data and partial participation.
Developing certified defenses for test-time adaptation where input distribution shifts are common.
Combining efficiently-computable local detection with principled aggregation schemes that minimize both utility loss and poisoning risk.
Achieving low-latency yet reliable detection of orchestrator-level attacks, especially when the adversary employs protocol-compliant, optimizer-agnostic strategies.

FedPoisonTTP provides unambiguous benchmarks for evaluating future defense mechanisms against sophisticated, real-world-motivated poisoning threats in federated and personalization-centric learning environments (Mestari et al., 15 Sep 2025, Iftee et al., 24 Nov 2025, Nowroozi et al., 5 Mar 2024).