Federated Stochastic Gradient Descent Begets Self-Induced Momentum (2202.08402v1)
Abstract: Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. We show that running stochastic gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process. Based on this finding, we further analyze the convergence rate of a federated learning system by accounting for the effects of parameter staleness and communication resources. These results advance the understanding of the Federated SGD algorithm, and also forges a link between staleness analysis and federated computing systems, which can be useful for systems designers.
- Howard H. Yang (65 papers)
- Zuozhu Liu (78 papers)
- Yaru Fu (25 papers)
- Tony Q. S. Quek (237 papers)
- H. Vincent Poor (884 papers)