Papers
Topics
Authors
Recent
2000 character limit reached

Semi-Decentralized Federated Learning

Updated 21 November 2025
  • Semi-Decentralized Federated Learning is a distributed ML architecture that combines centralized and decentralized approaches through multi-level aggregation to enhance scalability and privacy.
  • It employs asynchronous, staleness-aware protocols that adapt to heterogeneous client capacities, mitigating stragglers and balancing data variability.
  • Empirical results indicate that SDFL can achieve up to 2× faster convergence and 3–5% higher early test accuracy compared to synchronous methods under high heterogeneity.

Semi-decentralized federated learning (SDFL) refers to a class of distributed machine learning architectures that combine hierarchical or hybrid aggregation patterns, typically via clusters or intermediate servers, bridging the extremes of fully centralized federated learning (FL) and fully decentralized (peer-to-peer) learning. SDFL architectures deliver improved scalability, communication efficiency, and robustness by decomposing the model aggregation and consensus procedures across multiple, often dynamically organized, regions of a network—such as edge clusters, edge servers, or client clusters—while still preserving the core privacy protections of classical FL. The flexibility and design space of SDFL frameworks support diverse communication models, advanced straggler and staleness mitigation, heterogeneous data and device regimes, and dynamic client populations.

1. Architectural Foundations and Design Patterns

SDFL architectures consist of at least two levels of aggregation—a local or edge-tier and a higher-tier (edge/server/cloud)—with model flows partitioned into local training, intra-cluster aggregation, and inter-cluster (or global) aggregation phases. In canonical edge SDFL frameworks (Sun et al., 2021, Sun et al., 2021), the network comprises CC client devices statically associated with DD edge servers, forming DD disjoint “edge clusters” C1,,CD\mathcal{C}_1, \dots, \mathcal{C}_D. Each edge server communicates directly with its associated clients for local aggregation and with neighboring servers across a high-speed, typically wired, backbone. The aggregation topology of edge servers is often encoded via a binary adjacency matrix G{0,1}D×DG \in \{0,1\}^{D\times D}.

The typical SDFL update flow:

  • Local Training: Each client ii in cluster dd performs local mini-batch SGD or other optimizers on its private data.
  • Intra-cluster Aggregation: At a cluster-specific deadline or after a fixed number of local epochs, the edge server aggregates normalized model updates Δk(i)\Delta_k^{(i)} using a weighted mean, updating the local server-side model yk(d)y_k^{(d)}.
  • Inter-cluster Aggregation: Edge servers exchange their cluster models with neighbors, performing a distributed consensus update using a mixing matrix PkP_k.
  • Broadcast: The refreshed edge model yk(d)y_k^{(d)} is then broadcast to the cluster’s clients as the starting point for the next round.

This structure generalizes to more flexible hierarchies (e.g., multi-hop, ring, or mesh edge-server topologies (Sun et al., 2021, Sun et al., 2021)), and can be extended by incorporating device-to-device protocols, cluster-head rotation, or application-specific communication models.

2. Asynchronous and Staleness-Aware Protocols

In practice, device and communication heterogeneity lead to significant straggler and staleness effects. SDFL explicitly addresses these by decoupling the global iteration pace from the slowest subnetwork component. Each edge server dd can independently set a compute interval Tcomp(d)T_\mathrm{comp}^{(d)}, choosing aggregation deadlines aligned to its clients’ resource capacities (FLOPS, bandwidth), so that fast clients perform more SGD steps per round. Normalization of model increments by the number of local epochs τi\tau_i ensures statistical fairness and numeric stability:

Δk(i)=1τi(wk,τi(i)wk,0(i))=ητi=0τi1g(ξk,(i);wk,(i))\Delta_k^{(i)} = \frac{1}{\tau_i} (w_{k,\tau_i}^{(i)} - w_{k,0}^{(i)}) = -\frac{\eta}{\tau_i} \sum_{\ell=0}^{\tau_i-1} g(\xi_{k,\ell}^{(i)}; w_{k,\ell}^{(i)})

When edge servers aggregate asynchronously, each cluster triggers intra- and inter-cluster updates upon local completion, with no global synchronization barrier. The staleness of peer models entering an aggregation step is measured via δk(j)=kk(j)\delta_k^{(j)} = k - k'(j) (model age), and their contributions are discounted by a non-increasing weight function ψ(δ)\psi(\delta), such as ψ(δ)=1/[2(δ+1)]\psi(\delta) = 1 / [2(\delta+1)] (Sun et al., 2021). Inter-cluster mixing weights pki,jp_k^{i,j} are computed as normalized staleness-aware weights, ensuring that stale models have proportionally reduced effect in the aggregation.

This protocol ensures overall system progress despite stragglers, enabling edge clusters to proceed at their fastest safe pace and providing robustness in highly heterogeneous networks (Sun et al., 2021, Sun et al., 2021).

3. Convergence Theory and Performance Guarantees

SDFL frameworks are supported by rigorous convergence analyses under assumptions of LL-smoothness, unbiased and bounded-variance SGD, and bounded data heterogeneity. Using a global auxiliary model yˉk=dm~dyk(d)\bar{y}_k = \sum_d \tilde{m}_d y_k^{(d)} (weighted by cluster data proportions), the main result for asynchronous SDFL with staleness-aware dampening states:

1Kk=0K1E[F(yˉk)2]O(1K)+O(ηLH2σ2)+O(Aσ2+Bκ2)\frac{1}{K} \sum_{k=0}^{K-1} \mathbb{E}[\|\nabla F(\bar{y}_k)\|^2] \leq \mathcal{O}\left(\frac{1}{K}\right) + \mathcal{O}(\eta L H^2 \sigma^2) + \mathcal{O}(A \sigma^2 + B \kappa^2)

where HH is the device heterogeneity gap, σ2\sigma^2 the SGD variance, and A,BA, B are polynomials in the maximum staleness and HH (Sun et al., 2021). Choosing η=O(1/(LK))\eta = \mathcal{O}(1/(L \sqrt{K})) ensures vanishing optimality gap O(1/K)\mathcal{O}(1/\sqrt{K}). If H=1H=1 and staleness is absent (δmax=0\delta_{\max}=0), the bound matches synchronous SD-FEEL rates (Sun et al., 2021).

Significantly, larger heterogeneity or staleness increases the constants in the convergence bound, reflecting the necessity of proper normalization and staleness discounting to guarantee stable learning (Sun et al., 2021, Sun et al., 2021).

4. Impact of Heterogeneity and Communication Topology

SDFL’s advantage is most pronounced in non-IID settings and heterogeneous networks, as validated by large-scale experiments (e.g., 30 clients, 6 clusters, ResNet-18 on CIFAR-10, Dirichlet non-IID splits) (Sun et al., 2021). Asynchrony and staleness-aware aggregation enable up to 2×2\times faster convergence in wall-clock time compared to synchronous protocols under substantial heterogeneity gaps (e.g., H=5,10,30H=5,10,30), directly reducing idle waiting time and improving early test accuracy by +3–5% at fixed timepoints.

The trade-off is a modest increase in communication load, as asynchronous SDFL introduces more frequent inter-cluster exchanges. Topology also significantly impacts performance: denser edge-server graphs (smaller spectral radius ζ\zeta of the mixing matrix) permit faster consensus and reduced inter-cluster communication steps (Sun et al., 2021). For ring topologies (ζ0.6\zeta\approx0.6), more mixing steps are generally required per round for near-ideal consensus (Sun et al., 2021, Sun et al., 2021).

5. Algorithmic Realizations and Practical Implementation

A standard SDFL pseudocode entails:

  • Parallel local updates at all clients in a cluster.
  • Intra-cluster aggregation: weighted average of normalized increments.
  • Asynchronous trigger: upon cluster-specific deadline or completion, the associated edge server initiates inter-cluster aggregation without waiting for others, incrementing a global round index.
  • Inter-cluster model exchange: staleness-aware averaging over direct neighbors using mixing matrix PkP_k.
  • Broadcasting cluster models to all local clients for the next training phase.
  • Final global model consensus as a weighted sum of edge models.

Staleness metrics and normalization weights are computed and updated online. Simulation setups are realistic, incorporating nontrivial latency and bandwidth models (client–server at $5$ Mbps, server–server at $10$ Mbps, realistic FLOPs/body constraints), and client–cluster assignments reflect static or topology-aware distributions (Sun et al., 2021).

6. Trade-offs and Limitations

While asynchronous SDFL architectures substantially mitigate the impact of device and data heterogeneity, careful calibration of normalization, staleness weight functions, and the balance between communication overhead and model consensus is essential. High HH or excessive staleness without properly tuned discounting can slow convergence or introduce early-round statistical bias (as fast clients may dominate initial learning without reweighting). Moreover, although total communication per epoch may rise, the reduction in system idle time and the increased effective throughput generally outweigh these costs in practical networked environments (Sun et al., 2021, Sun et al., 2021).

The general framework does not preclude integration with further adaptations—adaptive cluster partitioning, trust/fault-aware scheduling, straggler-resilient coding, or advanced privacy-preserving techniques.

7. Comparative Summary and Empirical Results

Simulation studies demonstrate:

  • For H{5,10,30}H \in \{5, 10, 30\}, asynchronous SD-FEEL achieves 2×2\times faster wall-clock convergence versus synchronous protocols for fixed total epochs.
  • Short-term test accuracy improvements (e.g., +3–5% at 2000s) are substantial; synchronous approaches may eventually match final accuracy but incur much higher latency due to inefficiency in straggler handling.
  • Communication–latency and accuracy–fairness trade-offs are tunable via aggregation parameters and staleness weights.
  • Empirical evaluations confirm that synchronous variants (waiting for all clusters) are dominated in both convergence speed and efficiency by asynchronous SDFL when device and link heterogeneity is pronounced (Sun et al., 2021).

Semi-decentralized federated learning, by introducing cluster-based, staleness- and heterogeneity-aware asynchrony and consensus, constitutes an essential evolution of scalable, robust federated learning in heterogeneous and bandwidth-constrained edge and IoT networks.

Key References: (Sun et al., 2021, Sun et al., 2021, Sun et al., 2021)

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Semi-Decentralized Federated Learning.