FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation (2309.17174v1)
Abstract: Federated learning is a distributed learning framework that allows a set of clients to collaboratively train a model under the orchestration of a central server, without sharing raw data samples. Although in many practical scenarios the derivatives of the objective function are not available, only few works have considered the federated zeroth-order setting, in which functions can only be accessed through a budgeted number of point evaluations. In this work we focus on convex optimization and design the first federated zeroth-order algorithm to estimate the curvature of the global objective, with the purpose of achieving superlinear convergence. We take an incremental Hessian estimator whose error norm converges linearly, and we adapt it to the federated zeroth-order setting, sampling the random search directions from the Stiefel manifold for improved performance. In particular, both the gradient and Hessian estimators are built at the central server in a communication-efficient and privacy-preserving way by leveraging synchronized pseudo-random number generators. We provide a theoretical analysis of our algorithm, named FedZeN, proving local quadratic convergence with high probability and global linear convergence up to zeroth-order precision. Numerical simulations confirm the superlinear convergence rate and show that our algorithm outperforms the federated zeroth-order methods available in the literature.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. PMLR, Apr 2017, p. 1273–1282. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html
- S. Liu, P.-Y. Chen, B. Kailkhura, G. Zhang, A. O. Hero III, and P. K. Varshney, “A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications,” IEEE Signal Processing Magazine, vol. 37, no. 5, pp. 43–54, 2020.
- W. Fang, Z. Yu, Y. Jiang, Y. Shi, C. N. Jones, and Y. Zhou, “Communication-efficient stochastic zeroth-order optimization for federated learning,” IEEE Transactions on Signal Processing, vol. 70, p. 5058–5073, 2022.
- D. Hajinezhad, M. Hong, and A. Garcia, “Zeroth order nonconvex multi-agent optimization over networks,” no. arXiv:1710.09997, Feb 2019, arXiv:1710.09997 [math, stat]. [Online]. Available: http://arxiv.org/abs/1710.09997
- H. Feng, T. Pang, C. Du, W. Chen, S. Yan, and M. Lin, “Does federated learning really need backpropagation?” no. arXiv:2301.12195, May 2023, arXiv:2301.12195 [cs]. [Online]. Available: http://arxiv.org/abs/2301.12195
- Q. Zhang, B. Gu, Z. Dang, C. Deng, and H. Huang, “Desirable companion for vertical federated learning: New zeroth-order gradient based algorithm,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21. New York, NY, USA: Association for Computing Machinery, Oct 2021, p. 2598–2607. [Online]. Available: https://dl.acm.org/doi/10.1145/3459637.3482249
- A. Maritan and L. Schenato, “ZO-JADE: Zeroth-order curvature-aware distributed multi-agent convex optimization,” IEEE Control Systems Letters, vol. 7, p. 1813–1818, 2023.
- M. Safaryan, R. Islamov, X. Qian, and P. Richtárik, “FedNL: Making newton-type methods applicable to federated learning,” no. arXiv:2106.02969, May 2022, arXiv:2106.02969 [cs, math]. [Online]. Available: http://arxiv.org/abs/2106.02969
- N. D. Fabbro, S. Dey, M. Rossi, and L. Schenato, “Shed: A newton-type algorithm for federated learning based on incremental hessian eigenvector sharing,” no. arXiv:2202.05800, Sep 2022, arXiv:2202.05800 [cs, math]. [Online]. Available: http://arxiv.org/abs/2202.05800
- D. Leventhal and A. Lewis, “Randomized hessian estimation and directional search,” Optimization, vol. 60, no. 3, p. 329–345, Mar 2011.
- A. Agafonov, D. Kamzolov, R. Tappenden, A. Gasnikov, and M. Takáč, “FLECS: A federated learning second-order framework via compression and sketching,” no. arXiv:2206.02009, Jun 2022, arXiv:2206.02009 [math]. [Online]. Available: http://arxiv.org/abs/2206.02009
- Y. Feng and T. Wang, “Stochastic zeroth-order gradient and hessian estimators: variance reduction and refined bias bounds,” Information and Inference: A Journal of the IMA, vol. 12, no. 3, p. iaad014, Sep 2023.
- K. Balasubramanian and S. Ghadimi, “Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points,” Foundations of Computational Mathematics, vol. 22, no. 1, p. 35–76, Feb 2022.
- A. S. Berahas, L. Cao, K. Choromanski, and K. Scheinberg, “A theoretical and empirical comparison of gradient approximations in derivative-free optimization,” Foundations of Computational Mathematics, vol. 22, no. 2, pp. 507–560, 2022.
- J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller, “Inverting gradients-how easy is it to break privacy in federated learning?” Advances in Neural Information Processing Systems, vol. 33, pp. 16 937–16 947, 2020.
- R. Bollapragada, R. H. Byrd, and J. Nocedal, “Exact and inexact subsampled newton methods for optimization,” IMA Journal of Numerical Analysis, vol. 39, no. 2, p. 545–578, Apr 2019.
- J. Blackard, “Covertype,” UCI Machine Learning Repository, 1998, DOI: https://doi.org/10.24432/C50K5N.