Papers
Topics
Authors
Recent
2000 character limit reached

Byzantine-Fault-Tolerant Federated Learning

Updated 14 January 2026
  • Byzantine-Fault-Tolerant Federated Learning (BFT-FL) is a robust distributed optimization framework designed to achieve reliable model convergence despite the presence of malicious or faulty clients.
  • It employs techniques like statistical filtering, dynamic weighting, and feature-based scoring to mitigate the impacts of Byzantine behaviors while addressing challenges such as data heterogeneity and privacy.
  • Empirical studies demonstrate that protocols like Comparative Elimination and FedTruth achieve high accuracy on benchmarks (e.g., MNIST, CIFAR-10) even with up to 40–50% adversarial participation.

Byzantine-Fault-Tolerant Federated Learning (BFT-FL) is a broad and technically diverse subfield addressing adversarial robustness in federated, distributed, and decentralized machine learning under the threat of Byzantine clients—participants who can arbitrarily deviate from protocol, collude, or send corrupted updates. BFT-FL frameworks seek to ensure reliable optimization, convergence, and accuracy properties even when a (potentially significant) fraction of agents are malicious or faulty, often under additional constraints regarding privacy, data heterogeneity, and scalability.

1. Foundations: Problem Setting and Fault-Tolerance Criteria

BFT-FL frameworks revolve around distributed optimization in the presence of up to ff Byzantine agents among NN total participants. Each agent ii holds a local cost function qi:RdRq^i:\mathbb{R}^d\to\mathbb{R} (or fif_i in the classic notation) defined on its private data, and the system's aim is to find or approximate

x=argminxRdiHqi(x)x^\star = \arg\min_{x\in\mathbb{R}^d} \sum_{i\in\mathcal{H}} q^i(x)

where H\mathcal{H} is the latent set of honest agents, HNf|\mathcal{H}|\ge N-f (Gupta et al., 2021).

The canonical security notion is exact fault-tolerance: iterates of honest agents must converge to the minimizer determined by honest costs, regardless of the actions of Byzantines. Achievability of this is tightly linked to the notion of $2f$-redundancy—the property that any N2fN-2f honest agents suffice to identify the same global minimizer, i.e.,

argminxiSqi(x)=argminxiHqi(x)\arg\min_{x}\sum_{i\in S}q^i(x) = \arg\min_{x}\sum_{i\in\mathcal{H}}q^i(x)

for every SHS\subseteq\mathcal{H}, S=N2f|S|=N-2f.

This property is both necessary and sufficient for exact robust aggregation in classical synchronous, centralized settings. When only stochastic gradients are accessible (GiG^i are unbiased but noisy), approximate fault-tolerance holds: the optimality gap admits a bias governed by the variance σ2\sigma^2 and the Byzantine fraction f/(Nf)f/(N-f).

2. Core Methodological Strategies

Multiple algorithmic paradigms have emerged for BFT-FL:

2.1 Robust Aggregation via Statistical Filtering

Filter-based aggregation rejects or down-weights outlier updates using distance or robust statistics (Bouhata et al., 2022):

  • Krum/GeoMed: select updates closest to others in 2\ell_2 metric (Krum), or median (GeoMed), requiring $2f+2Bhattacharya et al., 2024).
  • Trimmed-Mean/Coordinate-wise Median: coordinatewise ordering and truncation, robust up to f<n/2f<n/2.
  • Bulyan: hierarchical, combining Krum and trimmed-mean for higher resilience.
  • Comparative Elimination (CE) (Gupta et al., 2021): at each round, sort local model iterates by distance from the previous global model, discard the ff farthest, and average the NfN-f survivors.

2.2 Dynamic Weighting and Truth Discovery

Rather than hard-clipping, some frameworks implement dynamic, optimization-based estimation of both the consensus update and client reliability:

  • FedTruth (Ebron et al., 2023): solves a convex program alternating between minimizing deviation from the consensus and inferring client reliabilities via a negative-entropy regularizer on weights, suppressing the influence of persistent outliers.
  • Robust-FL (Li et al., 2022): constructs a historical estimator for the "expected" next global model via exponential smoothing, clustering updates by distance to this estimator, with adaptive acceptance of only those within a dynamically estimated threshold.

2.3 Feature-Space and Consistency Scoring

Recently, "feature-driven" approaches leverage learned (or virtual) features to discriminate poisoned models:

  • Consistency scoring via virtual samples (Lee et al., 2024): probe all candidate updates using a set of server-generated virtual data. Models are grouped via pairwise feature-consistency (cosine similarity); those with lowest alignment are presumed Byzantine.
  • Dummy-contrastive aggregation (Lee et al., 2022): generate synthetic "dummy" inputs, extract features under each model, and score deviations from anchor (previous model) projections to spot outliers.

2.4 Coding and Redundancy

Some protocols employ gradient coding or redundant computation:

  • DRACO, RRR-BFT (Bouhata et al., 2022): distribute encoded (Redundant) partial gradients such that honest gradients can be decoded even if up to ff components are Byzantine.

2.5 Distributed and Decentralized Consensus Protocols

In decentralized/topology-heterogeneous settings, robust consensus protocols replace central aggregation:

  • PDMM-based BFT (Xia et al., 13 Mar 2025): employs the Primal-Dual Method of Multipliers, leveraging quadratic penalties and symmetry to iteratively force consensus while limiting the effect of arbitrary deviations.
  • Topology-aware DFL (Bhattacharya et al., 2024): adapts aggregation to local neighbor sets, highlighting vulnerabilities of classic rules in sparse or hub-dominated graphs.

2.6 Blockchain and Cryptography

Decentralized, tamper-resistant and privacy-preserving aggregation protocols:

  • Blockchain-based B-FL (Yang et al., 2022): implements multi-Krum robust aggregation via a PBFT blockchain consensus layer among multiple edge servers to resist both device and server-level Byzantine faults.
  • ByITFL (Xia et al., 2024): combines FLTrust trust-score robustification with polynomial approximation, Lagrange-coded and secret-shared updates, achieving full information-theoretic client-privacy against both the server and colluding user sets.

3. Representative Protocols and Key Guarantees

3.1 Comparative Elimination (CE) for Federated Local SGD

CE (Gupta et al., 2021) addresses the open question of achieving exact BFT for local SGD in federated settings. At each round:

  1. Honest agents perform TT local updates, returning local iterates xk,Tix^{i}_{k,T}.
  2. Server computes di=xk,Tixkd_i = \|x^{i}_{k,T} - x^{k}\|, sorts, keeps NfN-f closest updates, discards the ff farthest.
  3. The average of survivors is the new global model.

Under $2f$-redundancy and strong convexity,

  • Deterministic gradients: if ρ=f/(Nf)μ/(3L)\rho = f/(N-f) \leq \mu/(3L) (where μ\mu is strong convexity, LL smoothness), achieves linear convergence to xx^\star.
  • Stochastic gradients: introduces a bias of O(σ2α+σ2f/(Nf))O(\sigma^2\alpha+\sigma^2f/(N-f)) but still achieves O(σ2/μk+f/N)O(\sigma^2/\mu k + f/N) rates in expectation.

3.2 Dynamic Weight Aggregation (FedTruth)

FedTruth (Ebron et al., 2023) models server aggregation as

minΔ,pk=1nlogpkΔkΔ22σ2\min_{\Delta^*,\,p} \sum_{k=1}^n -\log p_k \frac{\|\Delta_k-\Delta^*\|_2^2}{\sigma^2}

with pk=1,pk0\sum p_k=1, p_k\geq0. Alternating updates provide closed-form solutions, down-weighting persistent outliers. Robustness is achieved as long as <50%<50\% of participants are Byzantine.

3.3 Feature-based Byzantine Detection

Server-side feature scoring (Lee et al., 2024, Lee et al., 2022) computes representation-space deviations under synthetic or virtual data. Robust aggregation is performed only among updates whose "behavior" in feature space is sufficiently consistent. This plug-in mechanism can materially improve robustness for a broad class of federated optimization algorithms.

4. Empirical Performance, Limitations, and Practical Considerations

4.1 Performance Benchmarks

Protocols such as CE (Gupta et al., 2021), FedTruth (Ebron et al., 2023), and various feature-based methods (Lee et al., 2024, Lee et al., 2022) have been validated across MNIST, CIFAR-10, Fashion-MNIST, medical imaging (Lee et al., 2024), and other benchmarks, under both random and structured poisoning attacks (e.g., sign-flip, model-boosting, backdoor, and Gaussian noise).

Key findings include:

  • CE achieves exact (linear-rate) convergence for local SGD under deterministic gradients and $2f$-redundancy with moderate κ\kappa and small f/Nf/N (Gupta et al., 2021).
  • FedTruth maintains main-task accuracy 95%\ge95\% and robust convergence with up to 40%40\% Byzantine/adversarial participation (Ebron et al., 2023).
  • Consistency scoring plug-ins preserve base FL convergence rates and deliver 60–70% higher accuracy than vanilla methods under 30%30\% targeted or untargeted model poisoning (Lee et al., 2024).
  • Practical and cryptographic schemes such as ByITFL (Xia et al., 2024) match non-private robust aggregation in both accuracy and privacy, tolerating b/n=50%b/n=50\% Byzantine participation subject to parameter settings.

4.2 Limitations and Open Directions

  • Many robust aggregators incur non-trivial computational cost: e.g., O(n2d)O(n^2d) per-round for Krum, or O(K2N)O(K^2N) forward passes per round in plug-in feature approaches (Lee et al., 2024).
  • Some schemes require prior knowledge of ff or upper bounds on participation, while others (notably Robust-FL (Li et al., 2022)) remove this assumption using adaptive clustering.
  • In highly non-i.i.d. regimes, classic robust aggregation can degrade, necessitating more sophisticated, topology-aware, or feature-driven filters (Bhattacharya et al., 2024).
  • Sparse or hub-dominated decentralized networks expose vulnerabilities due to limited honest neighborhood size; topology-aware rules are a current area of development (Bhattacharya et al., 2024).
  • Cryptographic protocols (e.g., ByITFL (Xia et al., 2024)) offer strongest privacy but have significant computation/communication overhead; integrating efficient privacy with robustness is a continual challenge.

5. Beyond Centralized Settings: Decentralization, Privacy, and Heterogeneity

Recent advances extend BFT-FL into more realistic environments:

6. Theoretical Guarantees and Fundamental Limits

Rates of convergence, resilience bounds, and sample complexity in BFT-FL are tightly linked to underlying assumptions:

  • For most classical robust aggregators, n/2\leq n/2 (sometimes n/3n/3) is the maximal tolerable Byzantine fraction under strong assumptions on statistical diversity of honest updates (Bouhata et al., 2022).
  • Feature-driven, clustering, or dynamic-weighting approaches can handle up to or even above 50%50\% in empirical studies, though without hard theoretical guarantees (Li et al., 2022).
  • The tightness of $2f$-redundancy for exact BFT holds in both theory and practice (Gupta et al., 2021).
  • In distributed optimization (PDMM (Xia et al., 13 Mar 2025)) and federated RL (Jordan et al., 2024), explicit sample complexity and bias bounds are available, matching non-Byzantine baselines up to additive terms proportional to the fraction and amplitude of Byzantine perturbations.

7. Future Directions and Open Questions

  • Topology-awareness: Real-world large-scale, heterogeneous networks require aggregation rules sensitive to dynamic, non-fully-connected topologies (Bhattacharya et al., 2024).
  • Adaptive filtering: Real-time estimation of the number of Byzantines, or hybrid schemes combining multiple filters, are promising for practical systems (Lee et al., 2024).
  • Scalability: Sublinear aggregation costs (e.g., sketch-based median, compressed updates, lightweight privacy) are increasingly critical as deployment scale increases.
  • Asynchrony: Developing BFT-FL protocols that overcome the straggler or staleness problem without sacrificing robustness is key (Cox et al., 2024).
  • Integration of privacy, heterogeneity, and security: Simultaneously achieving DP, robustness to extreme heterogeneity, and BFT at scale remains an active area, as illustrated by hybrid trust/fingerprint modules (Karami et al., 31 Jul 2025, Nie et al., 2024).
  • Statistical and adversarial lower bounds: Formalizing information-theoretic limits for both classical and feature-based BFT-FL, in both central and decentralized settings, is ongoing.

In summary, Byzantine-Fault-Tolerant Federated Learning encompasses a diverse toolkit of algorithms including robust aggregation (statistical, distance-based, feature-space, optimization-based), coding methods, consensus/cryptography, and adaptive trust assignment. These protocols collectively provide strong theoretical and empirical guarantees for accuracy and convergence in cooperative ML under adversarial, heterogeneous, and privacy-sensitive environments, though substantial open challenges remain in scaling, adaptivity, and formal statistical characterization of their ultimate limits (Gupta et al., 2021, Ebron et al., 2023, Lee et al., 2024, Xia et al., 13 Mar 2025, Nie et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Byzantine-Fault-Tolerant Federated Learning (BFT-FL).