SFL Convergence Bound Analysis

Updated 29 September 2025

The paper establishes finite-round convergence upper bounds for SFL by quantifying the influence of heterogeneity, client failures, and model-splitting strategies.
Advanced techniques such as electric network analogies, Markov chain coupling, and Lyapunov function arguments form the core of the analytical derivation.
The findings offer practical design guidelines for optimizing distributed learning in non-iid, unreliable network environments.

A convergence upper bound for SFL (Split Federated Learning / Sequential Federated Learning), when used in collaborative distributed optimization over heterogeneous data and potentially unreliable or stragglers-prone networks, provides a non-asymptotic, finite-round guarantee for the rate at which the distributed system’s global model approaches local optima. This bound quantifies the interplay between system factors—client heterogeneity, communication topology, protocol scheduling, device failures, mini-batch sizes, and model-splitting strategies—and the achievable accuracy (or residual disagreement) after a finite number of rounds. The contemporary literature has developed a rigorous theory that establishes these upper bounds for distinct forms of SFL (sequential, split, hierarchical multi-tier SFL, and time-driven SFL), tightly characterizing the influence of data, network, and system parameters on the learning rate.

1. Core Techniques for Convergence Upper Bound Derivation

The theoretical upper bounds are derived via advanced stochastic optimization and Markov process tools. The main ingredients include:

Electric Network Analogy and Effective Resistance: For consensus-like protocols, the convergence rate is linked to the network’s effective resistance, which bounds the mixing time for information propagation. Specifically, for a weighted network with edge weights $w_{ij}$ , the commute time between two nodes $x,y$ is related by $H_{P^B}(x,y) + H_{P^B}(y,x) = w \cdot r_{xy}'$ , with $w$ as total network weight and $r_{xy}'$ the effective resistance (Shang et al., 2012, Shang et al., 2014).
Random Walks and Meeting/Hitting Times: Distributed model update propagation is mapped to the hitting/meeting times of random walkers (or tokens) on the network graph, with the expected time for consensus or “mixing” set by the maximal hitting time or cover time. In sequential/cyclic update cases, meeting time bounds ( $\mathcal{M}(\mathcal{G}) < 4 H_{P^B}(\mathcal{G})$ for the binary consensus algorithm) inform upper limits on convergence time.
Markov Chain Coupling and Potential Functions: The convergence of opinions or local models is tied to the expected time for coupled Markov chains (or a defined potential function) to contract to a consensus. For example, $\phi(x,y)$ is constructed via hitting times and analyzed to yield cover and mixing time bounds.
Lyapunov Function Arguments and Energetic Reductions: For quantized (finite-value) SFL, the decrease of a system-wide Lyapunov function per meaningful update provides a direct mechanism to upper-bound the total convergence time, by relating the maximal system “energy” to the minimal energy drop per event (Shang et al., 2014).
Variance and Heterogeneity Decomposition: For federated and SFL algorithms, the statistical heterogeneity (e.g., $\zeta$ , $\zeta_*$ ) and stochastic gradient noise ( $\sigma^2$ ) are explicitly decomposed in the error terms of the convergence upper bound (Li et al., 2 May 2024, Li et al., 2023).
Decoupling of Server- and Client-side Updates in Model Splitting: In split (and hierarchical split) FL, convergence is determined by separate dynamics on either side of the model partition. The difference between client-side and server-side optima is bounded separately, leading to an aggregate error bound (Han et al., 23 Feb 2024, Lin et al., 10 Dec 2024).

2. Upper Bound Statements in Representative SFL Models

The established convergence upper bounds across sequential, split, hierarchical, and robust SFL protocols are summarized below. Each reflects a distinct technical scenario, but all are explicit in terms of core system parameters.

Sequential Federated Learning (SFL, cyclic client updates)

For $M$ clients, $K$ local steps, $R$ global rounds, strong convexity $\mu$ , smoothness constant $L$ , heterogeneity $\zeta_*$ , and stochastic variance $\sigma^2$ :

Strongly convex objective:

$\mathbb{E}[F(\bar{x}^{(R)})-F(x^*)] \leq \frac{9}{2}\mu D^{2}\exp\left(-\frac{\mu\,\tilde{\eta}\,R}{2}\right) +\frac{12\,\tilde{\eta}\,\sigma^{2}}{M K} +\frac{18\,L\,\tilde{\eta}^{2}\,\sigma^{2}}{M K} +\frac{18\,L\,\tilde{\eta}^{2}\,\zeta_*^{2}}{M}$

where $\tilde{\eta} = \eta M K$ and $D = \|x^{(0)} - x^*\|$ (Li et al., 2 May 2024, Li et al., 2023).

General convex objective:

$\mathbb{E}[F(\bar{x}^{(R)})-F(x^*)] \leq O\left( \frac{\sigma D}{\sqrt{M K R}} + \frac{(L\sigma^2 D^4)^{1/3}}{M^{1/3} R^{2/3}} + \frac{(L\zeta_*^2 D^4)^{1/3}}{M^{1/3} R^{2/3}} + \frac{L D^2}{R} \right)$

Nonconvex objective (stationary point finding):

$\min_{0 \leq r \leq R} \mathbb{E} \|\nabla F(x^{(r)})\|^2 \leq O\left(\frac{(L\sigma^2A)^{1/2}}{\sqrt{M K R}} + \frac{(L^2\sigma^2 A^2)^{1/3}}{M^{1/3} K^{1/3} R^{2/3}} + \frac{(L^2\zeta^2 A^2)^{1/3}}{M^{1/3} R^{2/3}} + \frac{L A}{R} \right)$

Split Federated Learning (parallel client-side, server-side split)

With $S$ -smoothness, strong convexity $\mu$ (where applicable), $T$ rounds, client-side update intervals $\tau_{\max}$ , heterogeneity $\epsilon^2$ , probability of participation $q_n$ :

Strongly convex:

$\mathbb{E}[f(\boldsymbol{x}^{T})] - f(\boldsymbol{x}^*) = O\left(\frac{1}{T}\right)$

The bound includes constants depending on $S$ , $\mu$ , client and server update variances, and is additive in the error due to $\epsilon^2$ heterogeneity and dilution factors $1/q_n$ in the partial participation regime (Han et al., 23 Feb 2024).

General convex:

$\mathbb{E}[f(\boldsymbol{x}^{T})] - f^* = O\left(\frac{1}{T^{1/3}}\right)$

Nonconvex:

$\frac{1}{T}\sum_{t=0}^{T-1}\eta^t\,\mathbb{E}\bigl[\|\nabla f(\boldsymbol{x}^{t})\|^2\bigr] \leq \text{(terms that vanish as } T\to\infty)$

Hierarchical SFL (multi-tier, hybrid split and aggregation)

Averaged gradient norm over $R$ rounds, $M$ tiers, aggregation intervals $I_m$ :

$\frac{1}{R} \sum_{t=1}^R \mathbb{E}[ \|\nabla f(\bar{w}^{(t-1)}) \|^2 ] \leq \frac{2 \theta}{\gamma R} + \frac{\beta \gamma \sum_{\ell=1}^L \sigma_\ell^2}{N} + 4\beta^2 \gamma^2 \sum_{m=1}^{M-1} \mathbb{1}_{I_m>1} I_m^2 \sum_{\ell} G_\ell^2$

where the last term is cumulative over sub-models with delayed aggregation (Lin et al., 10 Dec 2024).

HASFL: Batch Size and Model Splitting Optimization

Average squared gradient (over $R$ rounds):

$\frac{1}{R} \sum_{t=1}^R \mathbb{E}[ \|\nabla f(w^{(t-1)}) \|^2 ] \leq \frac{2 \theta}{\gamma R} + \frac{ \beta \gamma }{N^2 } \sum_{i=1}^N \sum_{j=1}^L \frac{\sigma_j^2}{b_i} + \mathbb{1}_{I>1} 4\beta^2 \gamma^2 I^2 \sum_{j=1}^{L_c} G_j^2$

(Lin et al., 10 Jun 2025)

SFL Under Unstable Client Participation

With client sampling probability $q_i$ , per-client drop/failure probabilities ( $p_i$ , $\phi_i$ , $a_i$ ), model split positions $L_c^i$ :

$\frac{1}{R}\sum_{t=1}^R \mathbb{E}[\|\nabla f(w^{(t-1)})\|^2] \leq (2\theta)/(\gamma R) - \sum_i (m_i^2/q_i) \sum_j G_j^2 + \text{[error terms depending on $p_i $,$ \phi_i $,$ a_i $,$ L_c^i $,$ \sigma_j^2 $,$ G_j^2 $, and aggregation interval$ I$]}$

(Wei et al., 22 Sep 2025)

3. The Impact of System Heterogeneity and Participation Failures

A recurring insight across all upper bounds is the explicit and often multiplicative role of data or device heterogeneity, partial participation, and system-level failures:

Heterogeneity Scaling: Error terms due to client drift/heterogeneity ( $\zeta$ , $\zeta_*$ , $\epsilon^2$ ) scale as $1/M$ in upper bounds for SFL, yielding a provable advantage over parallel FL (PFL) in highly non-iid regimes (Li et al., 2 May 2024, Li et al., 2023).
Partial Participation/Stragglers: Dilution of updates through intermittent client participation introduces factors of $1/q_n$ , amplifying error and slowing convergence; bounds accommodate these effects for both SFL and split SFL (Han et al., 23 Feb 2024, Wei et al., 22 Sep 2025).
Batch Size and Model Split Optimization: The batch size $b_i$ of edge devices enters denominator terms for variance, suggesting that stronger clients can exploit larger $b_i$ to mitigate stochastic noise; the choice of cut layer $L_c$ modulates the frequency and effect of aggregation errors (Lin et al., 10 Jun 2025).
Communication Failures: In SFL under network unreliability, error terms involving failure probabilities $p_i$ , $\phi_i$ , $a_i$ enter denominators; the bound increases steeply as these probabilities approach unity (Wei et al., 22 Sep 2025).

4. Optimization and System Design Implications

By analytically quantifying convergence slowdown due to heterogeneity, failures, or resource imbalance, SFL upper bounds become a formal objective for system co-design:

Joint Optimization: The convergence bound provides an objective for the joint optimization of client sampling and model splitting. For example, (Wei et al., 22 Sep 2025) formulates and solves a constrained minimization over $q_i$ , $L_c^i$ with closed-form and bisection methods, rigorously controlling system performance under participant instability.
Adaptive Aggregation and Splitting: Multi-tier SFL (HSFL) leverages tierwise aggregation interval $I_m$ selection and split-point $\mu_{m, \ell}$ optimization (including via block coordinate descent and Dinkelbach’s algorithm) to minimize latency for a given target accuracy (Lin et al., 10 Dec 2024).
Aggregation Weighting: Optimized aggregation weights, via discriminative model selection or explicit weight formulas, minimize the upper bound by amplifying reliable/high-contribution clients and filtering low-impact updates (Shao et al., 11 May 2024).

5. Empirical Validation and Practical SFL Performance

Experiments across SFL variants confirm theoretical claims:

Sequential SFL outperforms PFL (parallel FL) in highly heterogeneous regimes, achieving higher accuracy and faster (round-wise) convergence (e.g., $81.05\%$ vs. $73.84\%$ on CIFAR-10, $C=1$ class/client; (Li et al., 2 May 2024)) when client data distributions are skewed.
Split, hierarchical, and HASFL approaches demonstrate significant gains in speed and model quality under realistic non-iid, straggler-prone, or resource-imbalanced settings, attributable to the theoretical guidance provided by convergence upper bounds (Lin et al., 10 Dec 2024, Lin et al., 10 Jun 2025, Wei et al., 22 Sep 2025).
Adversarial or partial participation scenarios are directly addressed via participation probabilities and model split depth adaptation, ushering robust performance under volatile edge participation.

6. Theoretical Significance and Open Directions

The current theory resolves the “SFL convergence dilemma” by demonstrating that sequential (or appropriately split/optimized) federated algorithms can provably outperform classical PFL methods under realistic system constraints. The explicit convergence upper bounds quantify trade-offs and inform optimal system control, model partitioning policies, sampling schedules, and aggregation strategies on resource-constrained, failure-prone, or non-iid edge networks.

Future analysis may extend these results to more expressive model families, adversarial participation, or finer-grained statistical heterogeneity, potentially incorporating minimax or lower-bound gap analyses for stronger guarantees in both small- and large-scale federated deployments.