Decentralized Federated Learning

Updated 28 December 2025

Decentralized Federated Learning is a distributed ML paradigm where clients collaboratively train models on private data to enhance privacy and resilience.
It leverages diverse network topologies, such as mesh, ring, and hierarchical, to optimize convergence rates and reduce communication overhead.
Advanced protocols like decentralized SGD, consensus ADMM, and robust aggregation ensure scalability, security, and effective handling of data heterogeneity.

Decentralized Federated Learning (DFL) is a distributed machine learning paradigm in which clients collaboratively train models using private data, exchanging knowledge directly in a peer-to-peer fashion or via distributed ledgers, thus eliminating the need for a centralized server. This design mitigates single-point-of-failure risks, enhances system resilience, and strengthens privacy guarantees by obviating the central orchestrator present in classical federated learning. DFL encompasses diverse architectural, algorithmic, and security innovations, and is increasingly employed across edge computing, industry, healthcare, and other privacy-sensitive domains (Yuan et al., 2023, Gabrielli et al., 2023, Hallaji et al., 25 Jan 2024).

1. Fundamental Models and Topologies

DFL formalizes collaborative optimization as a consensus problem over a graph. Each of $N$ clients holds local data and maintains a model $w_i$ . The global objective is typically

$\min_{w_1,\dots,w_N} \; \sum_{i=1}^N p_i F_i(w_i) \quad \text{s.t.}\quad w_i = w_j \;\; \forall (i,j) \in E,$

where $p_i$ are aggregation weights, $F_i$ is the local loss, and $G=(V,E)$ is the physical or logical network topology (Yuan et al., 2023, Gabrielli et al., 2023).

Topological structures include:

Fully connected (mesh): Rapid consensus but high per-round communication.
Ring, line, star: Lower degree, tradeoff in convergence rate vs. efficiency.
Dynamic/random graphs: Adapt to node churn, mobility, or bandwidth constraints.
Hierarchical/multi-level: Aggregators coordinate subgroups, combining DFL with clustered FL (Yuan et al., 2023).

Topological choice governs the convergence rate, fault tolerance, and communication complexity, with spectral properties of the mixing matrix $W$ (derived from $G$ ) controlling error decay ( $O(\sigma^k)$ where $\sigma$ is its second largest eigenvalue) (Yuan et al., 2023).

2. Core Algorithms and Protocols

Canonical DFL adopts local training on private data, followed by model aggregation across neighbors. Common procedures include:

Decentralized SGD (DSGD):

$w_i^{k+1/2} = \sum_{j} W_{ij} w_j^k,\quad w_i^{k+1} = w_i^{k+1/2} - \eta \nabla F_i(w_i^k)$

Consensus ADMM: Primal-dual optimization over connected graphs, supporting more general objective constraining (Yuan et al., 2023, Gabrielli et al., 2023).
Gossip Averaging: Each node exchanges and mixes parameters with a random neighbor.
Gradient Tracking: Nodes maintain surrogate gradient states to correct bias introduced by heterogeneity ( $y_i^k$ tracks $\sum_j \nabla F_j$ ) (Gao et al., 2023).
Push-Sum on Directed Graphs: De-biased parameter sharing for asymmetric (column-stochastic) network structures (Li et al., 2023).
Event-triggered protocols: Clients transmit updates only upon significant local changes or resource-awareness, reducing communication (Zehtabi et al., 2022).

Aggregation rules (median, trimmed mean, Krum, FedProx, Zeno, MultiKRUM) are employed to mitigate Byzantine attacks and to generalize FedAvg beyond centralized settings (Beltrán et al., 2023).

3. Communication, Scalability, and Efficiency

DFL aims to suppress communication bottlenecks of centralized FL. Critical factors include:

Bandwidth utilization: Decentralized segmented gossip enables model parallelism across diverse links, ensuring node bandwidth saturation and linear speedups relative to centralized protocols (Hu et al., 2019).
Synchronization schemes: Synchronous rounds can incur straggler latency. Asynchronous and event-triggered protocols increase overall efficiency, at the cost of more intricate convergence analysis (Zehtabi et al., 2022, Yuan et al., 2023).
Communication cost: Peer-to-peer and push-sum schemes reduce effective per-round data transfer to $O(\log K)$ per client for proxy models, and $O(\text{degree})$ for dense models, as compared to $O(K)$ in server-centric FL (Kalra et al., 2021, Hu et al., 2019, Gabrielli et al., 2023).
Overhead: Blockchain-based approaches add transaction and consensus costs, which may become significant at scale but confer auditability and incentive compatibility (Zhang et al., 2023, Ghanem et al., 2021, ChaoQun, 2022, S et al., 26 Apr 2025).

4. Security, Trust, and Privacy Mechanisms

Decentralization both mitigates and introduces new attack surfaces. Key mechanisms include:

Blockchain integration: Client updates, auditor verification, and reputation are managed via on-chain smart contracts, typically secured by PBFT or other BFT protocols; auditors validate model updates and deter poisoning (Zhang et al., 2023, ChaoQun, 2022, Ghanem et al., 2021, Hallaji et al., 25 Jan 2024).
Committee and reputation: Clients or aggregators are scored for trustworthiness based on model performance, update quality, or token stake, with low-reputation actors filtered from aggregation or penalized (Zhang et al., 2023, ChaoQun, 2022).
Differential privacy and secure aggregation: DP noise injection on gradients or weights bounds data leakage; secure multiparty or homomorphic protocols further restrict visibility during aggregation (Zhang et al., 2023, Kalra et al., 2021, Hallaji et al., 25 Jan 2024, ChaoQun, 2022).
Model/gradient privacy: Proxy models and mutual learning avoid the need to transmit raw weights or data, reducing vulnerability to inversion, membership inference, and adversarial manipulation (Wittkopp et al., 2021, Kalra et al., 2021).
Robust aggregators: Median, trimmed mean, MultiKRUM, and other Byzantine-resilient rules (Gabrielli et al., 2023, Beltrán et al., 2023, Hallaji et al., 25 Jan 2024).
Incentive and slashing mechanisms: Clients and validators are rewarded or slashed for update quality or malicious behavior via smart contracts (Zhang et al., 2023, ChaoQun, 2022, Ghanem et al., 2021).

Attack models include Byzantine (malicious) participants, honest-but-curious adversaries, and consensus attacks on blockchain, with analytical bounds on subversion and defense success probabilities (Hallaji et al., 25 Jan 2024).

5. Model and Data Heterogeneity

DFL frameworks accommodate heterogeneity in both model architectures and data distributions:

Mutual learning/distillation: Clients exchange probabilistic knowledge or logits via knowledge transfer, supporting arbitrary architectures and robust adaptation under severe non-IID splits (Khalil et al., 2 Feb 2024, Wittkopp et al., 2021).
Proxy/teacher-student protocols: Models share synthetic representations, not parameters, yielding strong privacy and fast cold-start adaptation (Wittkopp et al., 2021, Kalra et al., 2021).
Personalized aggregation: Clients select neighbors dynamically or tailor aggregation weights via game-theoretic or scoring approaches, optimizing individual prediction performance (Behera et al., 5 Oct 2024).
Zero-shot decentralized FL: Prompt-sharing between clients enables adaptation for large vision-LLMs with dramatically reduced communication ( $\sim$ 118 $\times$ improvement) and competitive generalization compared to centralized prompt aggregation (Masano et al., 30 Sep 2025).

6. Theoretical Analysis and Empirical Results

Theoretical guarantees in DFL depend on network structure, update algorithms, and loss function convexity:

Convergence rates: Under smoothness and convexity, DFL achieves $O(1/T)$ (DeceFL, DSGD) or $O(1/\sqrt{T})$ (non-convex loss) in function value and consensus error; metric depends on spectral gap or mixing time (Yuan et al., 2021, Yuan et al., 2023, Gabrielli et al., 2023, Li et al., 2023, Zehtabi et al., 2022).
Robustness: Empirical studies demonstrate resilience to 30\% malicious clients, minimal drop in accuracy when reputation or robust aggregation is used, and scalability to hundreds of clients or extreme data fragmentation (Zhang et al., 2023, ChaoQun, 2022, Pérez et al., 23 Jul 2025, Kalra et al., 2021, Hu et al., 2019).
Resource and communication efficiency: Adaptive event-triggered and asynchronous protocols halve communication time compared to classical gossip, with negligible accuracy loss (Zehtabi et al., 2022).
Model consistency: Despite full decentralization, models converge with inter-client standard deviation below 1% in zero-shot prompt learning (Masano et al., 30 Sep 2025).
Comparative performance: Decentralized architectures frequently achieve accuracy comparable to centralized FL baselines but with improved fault tolerance and substantial speedup on geo-distributed infrastructures (Beltrán et al., 2023, S et al., 26 Apr 2025, Hu et al., 2019).

7. Open Challenges and Research Directions

DFL presents several unresolved issues and directions for future work:

Scalability: Efficient compression, quantization, and communication protocols for large models are required to support ultra-large client fleets (Gabrielli et al., 2023, Yuan et al., 2023).
Blockchain overhead and consensus scalability: Latency and storage can bottleneck FL rounds as model and blockchain sizes increase (Zhang et al., 2023, ChaoQun, 2022, Ghanem et al., 2021, Hallaji et al., 25 Jan 2024).
Formal guarantees under asynchrony, heterogeneity, adversary dynamics: Rigorous convergence analysis in asynchronous, dynamically evolving, adversarial topologies remains partially unsolved (Yuan et al., 2023, Behera et al., 5 Oct 2024, Li et al., 2023).
Privacy-utility tradeoff: Joint calibration of DP, secure aggregation, and learning utility for deep models and sensitive domains (Hallaji et al., 25 Jan 2024, Kalra et al., 2021, ChaoQun, 2022, Zhang et al., 2023).
Incentive-compatible mechanisms: Robust token or reputation schemes for motivating truthful participation without central oversight (Zhang et al., 2023, ChaoQun, 2022).
Personalization and vertical/horizontal heterogeneity: Tailoring DFL for personalized models, vertical federated settings, and cross-domain collaboration (Behera et al., 5 Oct 2024, Pérez et al., 23 Jul 2025, Khalil et al., 2 Feb 2024).
Integration with trusted hardware: Use of trusted execution environments to further confine privacy exposure (Hallaji et al., 25 Jan 2024).

DFL represents a highly technical, rapidly evolving research frontier with a rich interplay between distributed optimization, security, privacy, and large-scale deployment considerations. The surveyed works outline the concrete progress in decentralized architectures, robust aggregation, privacy-enhancement, blockchain integration, and flexible, heterogeneous model learning (Yuan et al., 2023, Zhang et al., 2023, Gabrielli et al., 2023, Hallaji et al., 25 Jan 2024, Yuan et al., 2021, ChaoQun, 2022, Hu et al., 2019, Masano et al., 30 Sep 2025).