Papers
Topics
Authors
Recent
Search
2000 character limit reached

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

Published 26 Jan 2022 in cs.LG and math.OC | (2201.11066v1)

Abstract: We present a theoretical study of server-side optimization in federated learning. Our results are the first to show that the widely popular heuristic of scaling the client updates with an extra parameter is very useful in the context of Federated Averaging (FedAvg) with local passes over the client data. Each local pass is performed without replacement using Random Reshuffling, which is a key reason we can show improved complexities. In particular, we prove that whenever the local stepsizes are small, and the update direction is given by FedAvg in conjunction with Random Reshuffling over all clients, one can take a big leap in the obtained direction and improve rates for convex, strongly convex, and non-convex objectives. In particular, in non-convex regime we get an enhancement of the rate of convergence from $\mathcal{O}\left(\varepsilon{-3}\right)$ to $\mathcal{O}\left(\varepsilon{-2}\right)$. This result is new even for Random Reshuffling performed on a single node. In contrast, if the local stepsizes are large, we prove that the noise of client sampling can be controlled by using a small server-side stepsize. To the best of our knowledge, this is the first time that local steps provably help to overcome the communication bottleneck. Together, our results on the advantage of large and small server-side stepsizes give a formal justification for the practice of adaptive server-side optimization in federated learning. Moreover, we consider a variant of our algorithm that supports partial client participation, which makes the method more practical.

Citations (24)

Summary

  • The paper introduces a novel FL algorithm that leverages server-side stepsize modulation and sampling without replacement to improve convergence in convex, strongly convex, and non-convex settings.
  • It provides a rigorous complexity analysis, demonstrating a convergence rate improvement from O(ε⁻³) to O(ε⁻²) in strongly convex regimes through adaptive strategies.
  • Experimental validations confirm that tuning server and client stepsizes reduces gradient variance and communication overhead, enhancing practical decentralized learning.

Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization

Introduction

This paper presents a theoretical study on enhancing federated learning through server-side optimization techniques, specifically focusing on stepsize management on the server and the strategic use of sampling without replacement. Federated Learning (FL) is a decentralized approach to training machine learning models across distributed and heterogeneous client data sources, as initially introduced by Konečný et al. and McMahan et al. in the mid-2010s. The main drivers for FL are privacy preservation and reduced communication overhead by performing training locally on client datasets.

Problem Formulation

The core problem is the optimization of a global objective function that is a sum of local loss functions distributed across numerous clients:

f(x)=1Mm=1Mfm(x)f(x) = \frac{1}{M} \sum\limits_{m=1}^{M} f_{m}(x)

where MM is the number of clients, and each fmf_m represents the loss function over the local data of client mm. Each local function fmf_m is further decomposed into a finite sum structure:

fm(x)=1ni=1nfmi(x)f_m(x) = \frac{1}{n}\sum\limits_{i=1}^{n} f^i_m(x)

where the fmif^i_m are the losses over individual data points on the client. The paper addresses the diverse regimes of convex, strongly convex, and non-convex objectives in FL.

Contributions

The paper introduces several notable contributions aimed at improving FL's efficacy:

  • Algorithm Design: A novel FL algorithm named Nastya combines existing techniques like partial client participation, local client training, data reshuffling, and adaptive server stepsizes. Notably, the algorithm employs a server-side learning rate modulation while clients perform local training using Random Reshuffling (RR).
  • Complexity Analysis: It provides rigorous complexity bounds for the proposed method across different convexity regimes, demonstrating substantial improvements in convergence rates. In strongly convex settings, a large server stepsize allows a leap from O(ε3)\mathcal{O}(\varepsilon^{-3}) to O(ε2)\mathcal{O}(\varepsilon^{-2}) in convergence.
  • Theoretical Insights: The paper presents formal justification for empirical observations in FL, particularly showing how server-side stepsize adjustment and sampling strategy mitigate communication bottlenecks and improve learning rates.
  • Experimental Validation: Through numerical simulations, it validates theoretical predictions and showcases how adaptive strategies bolster FL performance in practical scenarios compared to existing benchmarks.

Experimental Observations

Figure 1

Figure 1

Figure 1: Illustration of the dependence between server and client stepsizes on a simple example with M=2M=2 clients.

The experiments demonstrate that correctly tuning server-side stepsizes in conjunction with small client stepsizes can significantly reduce gradient variance, supporting faster and more reliable convergence in FL tasks. By comparing RR, adaptive gradient descent, and the proposed combined approach, the validations confirm that the integration of server-side stepsizes achieves better optimization and communication efficiency.

Implications and Theoretical Insights

This paper's implications highlight the critical interplay between server and client-side optimizations in federated systems and the enduring need to adapt these techniques for robust decentralized learning. By establishing theoretical bounds and practical improvements, the findings advocate for wider adoption of adaptive stepsize strategies in FL frameworks to leverage local computation with minimal server-client overhead.

Conclusion

The research advances the theoretical understanding and practical application of federated learning, specifically through the adaptation of server-side stepsizes and sampling strategies. By incorporating techniques like Random Reshuffling and adaptive learning rates, FL can achieve improved convergence rates and reduced communication complexity, opening avenues for further investigation into adaptive federated systems and deployment strategies across heterogeneous environments.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.