Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent (2310.03156v2)

Published 4 Oct 2023 in cs.LG and cs.DC

Abstract: The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Genetic cfl: Hyperparameter optimization in clustered federated learning. Computational Intelligence and Neuroscience, 2021, 2021.
  2. Two-point step size gradient methods. IMA journal of numerical analysis, 8(1):141–148, 1988.
  3. Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782, 2017.
  4. Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers, pp.  177–186. Springer, 2010.
  5. Federated learning of predictive models from federated electronic health records. International journal of medical informatics, 112:59–67, 2018.
  6. Differentiable self-adaptive learning rate. arXiv preprint arXiv:2210.10290, 2022.
  7. Bidirectional lstm networks for improved phoneme classification and recognition. In International conference on artificial neural networks, pp.  799–804. Springer, 2005.
  8. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
  9. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  10. Fedexp: Speeding up federated averaging via extrapolation. arXiv preprint arXiv:2301.09604, 2023.
  11. Adaptive hierarchical hyper-gradient descent. International Journal of Machine Learning and Cybernetics, 13(12):3785–3805, 2022.
  12. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
  13. Andrew K Kan. Federated hypergradient descent. arXiv preprint arXiv:2211.02106, 2022.
  14. Mime: Mimicking centralized stochastic algorithms in federated learning. arXiv preprint arXiv:2008.03606, 2020.
  15. Weight sharing for hyperparameter optimization in federated learning. In Int. Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML, volume 2020, 2020.
  16. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  17. Learning multiple layers of features from tiny images. 2009.
  18. On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
  19. Accelerating federated learning via momentum gradient descent. IEEE Transactions on Parallel and Distributed Systems, 31(8):1754–1766, 2020.
  20. Adagrad—an optimizer for stochastic gradient descent. Int. J. Inf. Comput. Sci, 6(5):566–568, 2019.
  21. Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning, pp.  2113–2122. PMLR, 2015.
  22. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp.  1273–1282. PMLR, 2017.
  23. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
  24. To talk or to work: Dynamic batch sizes assisted time efficient federated learning over future mobile edge devices. IEEE Transactions on Wireless Communications, 21(12):11038–11050, 2022.
  25. Federated learning with matched averaging. arXiv preprint arXiv:2002.06440, 2020a.
  26. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020b.
  27. Fedhpo-b: A benchmark suite for federated hyperparameter optimization. arXiv preprint arXiv:2206.03966, 2022.
  28. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  29. Seizing critical learning periods in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  8788–8796, 2022.
  30. How does learning rate decay help modern neural networks? arXiv preprint arXiv:1908.01878, 2019.
  31. A survey on federated learning. Knowledge-Based Systems, 216:106775, 2021.
  32. Flora: Single-shot hyper-parameter optimization for federated learning. arXiv preprint arXiv:2112.08524, 2021.
  33. Single-shot hyper-parameter optimization for federated learning: A general algorithm & analysis. arXiv preprint arXiv:2202.08338, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.