FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent (2310.03156v2)
Abstract: The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
- Genetic cfl: Hyperparameter optimization in clustered federated learning. Computational Intelligence and Neuroscience, 2021, 2021.
- Two-point step size gradient methods. IMA journal of numerical analysis, 8(1):141–148, 1988.
- Online learning rate adaptation with hypergradient descent. arXiv preprint arXiv:1703.04782, 2017.
- Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010: 19th International Conference on Computational StatisticsParis France, August 22-27, 2010 Keynote, Invited and Contributed Papers, pp. 177–186. Springer, 2010.
- Federated learning of predictive models from federated electronic health records. International journal of medical informatics, 112:59–67, 2018.
- Differentiable self-adaptive learning rate. arXiv preprint arXiv:2210.10290, 2022.
- Bidirectional lstm networks for improved phoneme classification and recognition. In International conference on artificial neural networks, pp. 799–804. Springer, 2005.
- Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Fedexp: Speeding up federated averaging via extrapolation. arXiv preprint arXiv:2301.09604, 2023.
- Adaptive hierarchical hyper-gradient descent. International Journal of Machine Learning and Cybernetics, 13(12):3785–3805, 2022.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- Andrew K Kan. Federated hypergradient descent. arXiv preprint arXiv:2211.02106, 2022.
- Mime: Mimicking centralized stochastic algorithms in federated learning. arXiv preprint arXiv:2008.03606, 2020.
- Weight sharing for hyperparameter optimization in federated learning. In Int. Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML, volume 2020, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Learning multiple layers of features from tiny images. 2009.
- On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189, 2019.
- Accelerating federated learning via momentum gradient descent. IEEE Transactions on Parallel and Distributed Systems, 31(8):1754–1766, 2020.
- Adagrad—an optimizer for stochastic gradient descent. Int. J. Inf. Comput. Sci, 6(5):566–568, 2019.
- Gradient-based hyperparameter optimization through reversible learning. In International conference on machine learning, pp. 2113–2122. PMLR, 2015.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
- To talk or to work: Dynamic batch sizes assisted time efficient federated learning over future mobile edge devices. IEEE Transactions on Wireless Communications, 21(12):11038–11050, 2022.
- Federated learning with matched averaging. arXiv preprint arXiv:2002.06440, 2020a.
- Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020b.
- Fedhpo-b: A benchmark suite for federated hyperparameter optimization. arXiv preprint arXiv:2206.03966, 2022.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
- Seizing critical learning periods in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 8788–8796, 2022.
- How does learning rate decay help modern neural networks? arXiv preprint arXiv:1908.01878, 2019.
- A survey on federated learning. Knowledge-Based Systems, 216:106775, 2021.
- Flora: Single-shot hyper-parameter optimization for federated learning. arXiv preprint arXiv:2112.08524, 2021.
- Single-shot hyper-parameter optimization for federated learning: A general algorithm & analysis. arXiv preprint arXiv:2202.08338, 2022.