Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convergence Analysis of Sequential Split Learning on Heterogeneous Data (2302.01633v3)

Published 3 Feb 2023 in cs.LG

Abstract: Federated Learning (FL) and Split Learning (SL) are two popular paradigms of distributed machine learning. By offloading the computation-intensive portions to the server, SL is promising for deep model training on resource-constrained devices, yet still lacking of rigorous convergence analysis. In this paper, we derive the convergence guarantees of Sequential SL (SSL, the vanilla case of SL that conducts the model training in sequence) for strongly/general/non-convex objectives on heterogeneous data. Notably, the derived guarantees suggest that SSL is better than Federated Averaging (FedAvg, the most popular algorithm in FL) on heterogeneous data. We validate the counterintuitive analysis result empirically on extremely heterogeneous data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Federated learning based on dynamic regularization. arXiv preprint arXiv:2111.04263, 2021.
  2. On the convergence of sgd with biased gradients. arXiv preprint arXiv:2008.00051, 2020.
  3. Decoupled greedy learning of CNNs. In International Conference on Machine Learning, pages 736–745. PMLR, 2020.
  4. Optimization methods for large-scale machine learning. Siam Review, 60(2):223–311, 2018.
  5. Convex optimization. Cambridge university press, 2004. URL https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf.
  6. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
  7. Clustered sampling: Low-variance and improved representativity for clients selection in federated learning. In International Conference on Machine Learning, pages 3407–3416. PMLR, 2021.
  8. End-to-end evaluation of federated learning and split learning for internet of things. arXiv preprint arXiv:2003.13376, 2020.
  9. Evaluation and optimization of distributed machine learning techniques for internet of things. IEEE Transactions on Computers, 2021.
  10. Handbook of convergence theorems for (stochastic) gradient methods. arXiv preprint arXiv:2301.11235, 2023.
  11. Comparison of privacy-preserving distributed deep learning methods in healthcare. arXiv preprint arXiv:2012.12591, 2020.
  12. Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications, 116:1–8, 2018.
  13. Accelerating federated learning with split learning on locally generated losses. In ICML 2021 Workshop on Federated Learning for User Privacy and Data Confidentiality. ICML Board, 2021.
  14. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
  15. Fedexp: Speeding up federated averaging via extrapolation. arXiv preprint arXiv:2301.09604, 2023.
  16. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  17. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pages 5132–5143. PMLR, 2020.
  18. Tighter theory for local sgd on identical and heterogeneous data. In International Conference on Artificial Intelligence and Statistics, pages 4519–4529. PMLR, 2020.
  19. A unified theory of decentralized sgd with changing topology and local updates. In International Conference on Machine Learning, pages 5381–5393. PMLR, 2020.
  20. Alex Krizhevsky et al. Learning multiple layers of features from tiny images. Technical report, 2009.
  21. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  22. Label leakage and protection in two-party split learning. arXiv preprint arXiv:2102.08504, 2021.
  23. Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pages 965–978. IEEE, 2022.
  24. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
  25. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  26. Random reshuffling: Simple analysis with vast improvements. Advances in Neural Information Processing Systems, 33:17309–17320, 2020.
  27. Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
  28. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
  29. How good is sgd with random shuffling? In Conference on Learning Theory, pages 3250–3284. PMLR, 2020.
  30. Random shuffling beats sgd only after many epochs on ill-conditioned problems. Advances in Neural Information Processing Systems, 34:15151–15161, 2021.
  31. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  32. Detailed comparison of communication efficiency of split learning and federated learning. arXiv preprint arXiv:1909.09145, 2019.
  33. Sebastian U Stich. Unified optimal analysis of the (stochastic) gradient method. arXiv preprint arXiv:1907.04232, 2019a.
  34. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication. arXiv preprint arXiv:1909.05350, 2019.
  35. Sebastian Urban Stich. Local SGD converges fast and communicates little. International Conference on Learning Representations (ICLR), page arXiv:1805.09767, 2019b. URL https://arxiv.org/abs/1805.09767.
  36. Splitfed: When federated learning meets split learning. arXiv preprint arXiv:2004.12088, 2020.
  37. Split learning for health: Distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564, 2018.
  38. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020.
  39. On the unreasonable effectiveness of federated averaging with heterogeneous data. arXiv preprint arXiv:2206.04723, 2022a.
  40. Fedlite: A scalable approach for federated learning on resource-constrained clients. arXiv preprint arXiv:2201.11865, 2022b.
  41. Minibatch vs local sgd for heterogeneous distributed learning. Advances in Neural Information Processing Systems, 33:6281–6292, 2020.
  42. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  43. Achieving linear speedup with partial worker participation in non-iid federated learning. arXiv preprint arXiv:2101.11203, 2021.
  44. Federated learning with only positive labels. In International Conference on Machine Learning, pages 10946–10956. PMLR, 2020.
  45. Bayesian nonparametric federated learning of neural networks. In International conference on machine learning, pages 7252–7261. PMLR, 2019.
  46. Speeding up heterogeneous federated learning with sequentially trained superclients. arXiv preprint arXiv:2201.10899, 2022.
  47. On the convergence properties of a k𝑘kitalic_k-step averaging stochastic gradient descent algorithm for nonconvex optimization. arXiv preprint arXiv:1708.01012, 2017.
  48. Xingyu Zhou. On the fenchel duality between strong convexity and lipschitz continuous gradient. arXiv preprint arXiv:1803.06573, 2018.
  49. Data-free knowledge distillation for heterogeneous federated learning. In International Conference on Machine Learning, pages 12878–12889. PMLR, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.