Semi-Decentralized Federated Learning
- SDFL is a distributed learning paradigm that organizes clients into clusters with local aggregators, enabling efficient intra-cluster consensus and scalable global synchronization.
- It employs hybrid communication strategies—combining device-to-device and device-to-server interactions—with asynchronous rounds and adaptive aggregation to improve robustness and reduce bandwidth overhead.
- SDFL underpins applications from energy-aware edge intelligence to blockchain-integrated trust models, addressing challenges in heterogeneity, security, and system scalability.
Semi-Decentralized Federated Learning (SDFL) is a distributed learning paradigm that bridges the gap between centralized federated learning (FL) architectures and fully decentralized/gossip-based learning schemes. SDFL organizes a large number of clients into clusters, typically with local aggregators (e.g., edge servers or peer-elected leaders), achieves local consensus or aggregation within clusters, and then either synchronizes these aggregates globally via a parameter server or via peer-to-peer server-level coordination. This architectural design offers substantial improvements in communication efficiency, robustness to client and server failures, and scalability particularly for large-scale, resource-constrained, and heterogeneous environments. SDFL models underpin practical systems and modern theoretical advances across energy-aware edge intelligence, trustable federated optimization, and communication-constrained learning.
1. Architectural Principles and Topologies
SDFL architectures have emerged in response to the bottlenecks and fragilities inherent to classic FL and decentralized protocols. Centralized FL funnels all model updates to a single server, limiting scalability, incurring a single point of failure, and suffering under uplink bandwidth constraints. Fully decentralized FL, conversely, relies on peer-to-peer exchange for model parameter dissemination—a scheme that, while robust to server failures, can be slow to converge and communication-intensive.
SDFL operates between these extremes:
- Hierarchical or clustered topologies: Clients are partitioned into disjoint clusters (typically determined by proximity or capabilities), each managed by a local aggregator (edge server or dynamically elected node). These aggregators perform intra-cluster model aggregation and then either: (a) communicate with a global server (star–ring, star–star, or star–mesh architectures), or (b) coordinate in a decentralized mapping (peer-to-peer or partial-mesh inter-aggregator protocols) (Sun et al., 2021, Huang et al., 2024, Wang et al., 2023, Ali-Pour et al., 17 Mar 2025).
- Device-to-Device (D2D) and Device-to-Server (D2S) Integration: Many SDFL variants combine local D2D (or intra-cluster) communication with periodic D2S global aggregation to exploit the reliability, low power, and latency of localized exchanges, while maintaining global statistical convergence (Parasnis et al., 2023, Lin et al., 2021, Weng et al., 2024).
- Edge–Cloud–Device Hierarchies: Multi-tier SDFL frameworks broadly follow the structure: Devices ↔ Edge-Servers ↔ Cloud-Server, with each layer executing model updates and consensus to diminish both communication cost and data drift (Huang et al., 2024, Sun et al., 2021).
This topology reduces peak communication on single nodes, yields increased robustness, and supports parallel, hierarchical, or asynchronous aggregation.
2. Representative Algorithms, Communication, and Aggregation Protocols
The prototypical SDFL round is composed of local updates, intra-cluster aggregation, and inter-cluster/global synchronization:
- Local Updates: Each client performs steps of SGD on private data.
- Intra-Cluster Aggregation: Cluster aggregator(s) combine local models, normally via weighted averaging by local dataset size (Sun et al., 2021, Ali-Pour et al., 17 Mar 2025).
- Inter-Cluster / Global Aggregation: Aggregators or server(s) combine cluster aggregates into a new global model. Options include:
- Centralized averaging at a parameter server (Lin et al., 2021, Huang et al., 2024).
- Peer-to-peer inter-edge consensus using mixing matrices or gossip (Sun et al., 2021, Sun et al., 2021).
- Incremental subgradient or ring-based incremental aggregation for non-IID robustness (Huang et al., 2024).
Asynchronous SDFL variants are realized by letting edge servers (or cluster heads) set their own aggregation deadlines independently, accept stale or partial updates with staleness-aware weighting, and avoid global round synchronization (Sun et al., 2021, Sun et al., 2021).
Cooperative and coded relaying strategies, such as collaborative relaying (ColRel) or deterministic diversity network codes, provide straggler resilience and unbiased aggregation in the event of intermittent client–server and client–client connectivity (Yemini et al., 2022, Weng et al., 2024, Weng et al., 2024, Yemini et al., 2022).
Probabilistic and adaptive communication schemes such as PISCO interleave agent-to-server and agent-to-agent rounds, with the mixing probability tuned to network conditions—this enables principled trade-offs between global and local communication (Wang et al., 2023).
3. Optimization, Convergence Theory, and Client Dynamics
SDFL leverages advanced optimization and consensus techniques to ensure statistically efficient and robust convergence:
- Convergence Guarantees: Under standard smoothness and convexity assumptions, synchronous and asynchronous SDFL methods yield convergence rates (non-convex) or (strongly convex), with explicit dependence on heterogeneity, staleness, and intra-cluster communication parameters (Sun et al., 2021, Sun et al., 2021, Wang et al., 2023).
- Staleness and Straggler Analysis: Asynchronous schemes introduce bias and variance controlled by the maximum staleness and heterogeneity gap . Staleness-aware aggregation and adaptive decay functions mitigate these effects (Sun et al., 2021, Sun et al., 2021).
- Gradient Coding and Diversity Networks: SDFL-coded strategies guarantee exact global gradient recovery under packet loss constraints, with outage probability and convergence rate derived as functions of code redundancy and wireless link statistics (Weng et al., 2024, Weng et al., 2024).
- Client Scheduling and Trust-aware Participation: Dynamic client participation, modeled via hidden semi-Markov models and trust metrics, is used to optimize server load, reliability, and model quality. Greedy and integer-programming-based schedulers select trusted and high-quality participants dynamically per round, preserving convergence rates and reducing training loss (Hu et al., 2024).
4. Trust, Incentive Mechanisms, and Blockchain Integration
Large-scale SDFL deployments require robust defenses against faulty or malicious nodes and unreliable participants. Recent work tightly integrates trust evaluation and incentive design within SDFL workflows:
- Trust Score Computation: Each node's trust is a weighted function of accuracy contribution, consistency, data quality, and participation regularity, with explicit decay and recovery (Shrestha, 9 Feb 2026).
- Admission and Reward Policies: Policies gate node admission, place low-trust nodes on probation, and suspend persistently low-contributing participants. Rewards are allocated proportional to a utility score and trust value; slashing is triggered by repeated failures or malicious action (Shrestha, 9 Feb 2026, Shrestha et al., 2023).
- Blockchain and Smart Contracts: Blockchain is employed to store model hashes, enforce incentive policy, and provide distributed, tamper-proof logs. Trust penalization mechanisms reduce the impact of malicious updates, and smart contracts handle slashing and reward distribution (Shrestha et al., 2023, Shrestha, 9 Feb 2026).
- Decentralized Storage: Off-chain storage using IPFS minimizes on-chain storage cost and latency. Model snapshots, round reports, and trust digests are referenced on-chain (typically as Merkle roots) (Shrestha, 9 Feb 2026).
Empirical results indicate that blockchain integration adds 10–15% computational overhead but does not degrade convergence or accuracy for up to 20 clients (Shrestha et al., 2023).
5. Communication Efficiency, Resource and Energy Considerations
SDFL protocols are explicitly designed to optimize communication load and computational cost:
- Hierarchical and Multi-tier Aggregation: By reducing the number of uplinks to the global server, cluster-based aggregation reduces per-node and total communication. For a total of clients and clusters, SDFL reduces server-bound messages from to per global round (Huang et al., 2024, Ali-Pour et al., 17 Mar 2025).
- Adaptive Participation and Connectivity-Aware Client Selection: SDFL frameworks dynamically sample clients within clusters based on connectivity (out-degree, singular values of cluster adjacency matrices) and energy budgets. Quantitative savings up to 50% in total D2S (device-to-server) transmissions to achieve a given accuracy are established in simulation (Parasnis et al., 2023).
- Partial Model Exchange and Lightweight Neighbor Aggregation: Offline clients or those with intermittent connection upload only partial parameters (e.g., last layer) to neighbors, reducing communication by per offline round (Bao et al., 3 Sep 2025).
- Compression, Quantization, Gradient Coding: Stochastic gradient quantization, systematic codebooks, and batch splitting further reduce per-message size with negligible accuracy impact (Weng et al., 2024, Weng et al., 2024, Ali-Pour et al., 17 Mar 2025).
- Parallel Local Updates: By configuring the number of local updates per global round (), SDFL schemes such as PISCO offer linear speedup in communication rounds relative to local computation capacity (Wang et al., 2023).
Table: Comparative Communication Overhead per Global Round
| Protocol | Uplinks per Round | D2D / Intra-Cluster | Aggregator/Server Uplink |
|---|---|---|---|
| Centralized FL (FedAvg) | K | - | 1 |
| Decentralized (gossip) | 0 | 0 | |
| SDFL (K clients, M clusters) | ~K |
Empirical results show SDFL can attain parity or improvement in steady-state accuracy (10–25% lower RMSE and faster convergence) over FedAvg, while reducing per-node memory and peak bandwidth by up to 35% (Ali-Pour et al., 17 Mar 2025, Bao et al., 3 Sep 2025).
6. Applications, Use-Cases, and System Implementations
SDFL models have been deployed in diverse contexts:
- Edge IoT Systems: Hierarchical SDFL (e.g., FedSR) efficiently trains deep models in Industrial IoT and sensor networks under strict privacy and bandwidth constraints, handling non-IID data splits and large-scale device populations (Huang et al., 2024).
- Mobile Trajectory and Time-Series Applications: FedDeCAB and derived methods provide robust time-series prediction for vehicle trajectories, marine vessel tracking, and similar large-scale, intermittently connected datasets (Bao et al., 3 Sep 2025).
- Wireless Environments with Unreliable Links: DNC-based coded SDFL and collaborative relaying address straggler and outage resilience in Rayleigh-fading models for massive wireless edge learning (Weng et al., 2024, Weng et al., 2024, Yemini et al., 2022, Yemini et al., 2022).
- Vertical Federated Learning: MTCD enables communication-efficient vertical FL by tuning the relative frequencies of client–client and client–server updates, interpolating between star and peer-to-peer, and supporting O(1/T) convergence (Valdeira et al., 2023).
- Real-Time and Edge-Scale FL: MQTT-based SDFL frameworks (SDFLMQ) dynamically manage clusters, assign aggregator roles, and support large DNN partitioning for resource-constrained clients (Ali-Pour et al., 17 Mar 2025).
- Trustworthy and Incentivized Collaboration: Recent SDFL systems incorporate blockchain-based trust, auditing, and incentives to combat poisoning attacks and encourage quality participation (Shrestha, 9 Feb 2026, Shrestha et al., 2023).
7. Limitations, Open Challenges, and Directions for Future Work
While SDFL models offer compelling advantages, multiple challenges remain:
- Parameter Selection and System Tuning: Optimal choices for cluster sizes, deadlines, aggregation periods (), and code redundancy involve trade-offs between communication cost and convergence, often requiring adaptive or online optimization (Sun et al., 2021, Lin et al., 2021, Parasnis et al., 2023).
- Scalability: Blockchain and off-chain orchestration introduce additional storage and latency overhead, which may be addressed by batch finalization, Merkle-tree rollups, or hierarchical incentive mechanisms (Shrestha et al., 2023, Shrestha, 9 Feb 2026).
- Heterogeneity and Fairness: Straggler and device heterogeneity, data non-IIDness, and intermittent connectivity require sophisticated trust-aware, staleness-adaptive scheduling and aggregation. Trust penalization can reduce diversity in participation, possibly slowing final convergence (Hu et al., 2024, Shrestha, 9 Feb 2026).
- Security: Random leader (cluster head) selection is vulnerable to Sybil or malicious-node attacks; verifiable random functions and reputation-based selection are suggested for future defenses (Shrestha et al., 2023).
- Empirical Validation: Many incentive and trust-based SDFL designs remain theoretical, pending extensive evaluation on adversarial and real-world datasets with malicious participant scenarios (Shrestha, 9 Feb 2026).
Further research directions include adaptive topology assignment, integration with large-model optimization, deployment in fully peer-to-peer SDFL, and the synthesis of privacy-enhancing technologies with trust and incentive layers. The interplay of communication complexity, energy usage, and security remains a fertile area for both theoretical and system innovation.