Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedZeN: Towards superlinear zeroth-order federated learning via incremental Hessian estimation (2309.17174v1)

Published 29 Sep 2023 in cs.LG and math.OC

Abstract: Federated learning is a distributed learning framework that allows a set of clients to collaboratively train a model under the orchestration of a central server, without sharing raw data samples. Although in many practical scenarios the derivatives of the objective function are not available, only few works have considered the federated zeroth-order setting, in which functions can only be accessed through a budgeted number of point evaluations. In this work we focus on convex optimization and design the first federated zeroth-order algorithm to estimate the curvature of the global objective, with the purpose of achieving superlinear convergence. We take an incremental Hessian estimator whose error norm converges linearly, and we adapt it to the federated zeroth-order setting, sampling the random search directions from the Stiefel manifold for improved performance. In particular, both the gradient and Hessian estimators are built at the central server in a communication-efficient and privacy-preserving way by leveraging synchronized pseudo-random number generators. We provide a theoretical analysis of our algorithm, named FedZeN, proving local quadratic convergence with high probability and global linear convergence up to zeroth-order precision. Numerical simulations confirm the superlinear convergence rate and show that our algorithm outperforms the federated zeroth-order methods available in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.   PMLR, Apr 2017, p. 1273–1282. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html
  2. S. Liu, P.-Y. Chen, B. Kailkhura, G. Zhang, A. O. Hero III, and P. K. Varshney, “A primer on zeroth-order optimization in signal processing and machine learning: Principals, recent advances, and applications,” IEEE Signal Processing Magazine, vol. 37, no. 5, pp. 43–54, 2020.
  3. W. Fang, Z. Yu, Y. Jiang, Y. Shi, C. N. Jones, and Y. Zhou, “Communication-efficient stochastic zeroth-order optimization for federated learning,” IEEE Transactions on Signal Processing, vol. 70, p. 5058–5073, 2022.
  4. D. Hajinezhad, M. Hong, and A. Garcia, “Zeroth order nonconvex multi-agent optimization over networks,” no. arXiv:1710.09997, Feb 2019, arXiv:1710.09997 [math, stat]. [Online]. Available: http://arxiv.org/abs/1710.09997
  5. H. Feng, T. Pang, C. Du, W. Chen, S. Yan, and M. Lin, “Does federated learning really need backpropagation?” no. arXiv:2301.12195, May 2023, arXiv:2301.12195 [cs]. [Online]. Available: http://arxiv.org/abs/2301.12195
  6. Q. Zhang, B. Gu, Z. Dang, C. Deng, and H. Huang, “Desirable companion for vertical federated learning: New zeroth-order gradient based algorithm,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21.   New York, NY, USA: Association for Computing Machinery, Oct 2021, p. 2598–2607. [Online]. Available: https://dl.acm.org/doi/10.1145/3459637.3482249
  7. A. Maritan and L. Schenato, “ZO-JADE: Zeroth-order curvature-aware distributed multi-agent convex optimization,” IEEE Control Systems Letters, vol. 7, p. 1813–1818, 2023.
  8. M. Safaryan, R. Islamov, X. Qian, and P. Richtárik, “FedNL: Making newton-type methods applicable to federated learning,” no. arXiv:2106.02969, May 2022, arXiv:2106.02969 [cs, math]. [Online]. Available: http://arxiv.org/abs/2106.02969
  9. N. D. Fabbro, S. Dey, M. Rossi, and L. Schenato, “Shed: A newton-type algorithm for federated learning based on incremental hessian eigenvector sharing,” no. arXiv:2202.05800, Sep 2022, arXiv:2202.05800 [cs, math]. [Online]. Available: http://arxiv.org/abs/2202.05800
  10. D. Leventhal and A. Lewis, “Randomized hessian estimation and directional search,” Optimization, vol. 60, no. 3, p. 329–345, Mar 2011.
  11. A. Agafonov, D. Kamzolov, R. Tappenden, A. Gasnikov, and M. Takáč, “FLECS: A federated learning second-order framework via compression and sketching,” no. arXiv:2206.02009, Jun 2022, arXiv:2206.02009 [math]. [Online]. Available: http://arxiv.org/abs/2206.02009
  12. Y. Feng and T. Wang, “Stochastic zeroth-order gradient and hessian estimators: variance reduction and refined bias bounds,” Information and Inference: A Journal of the IMA, vol. 12, no. 3, p. iaad014, Sep 2023.
  13. K. Balasubramanian and S. Ghadimi, “Zeroth-order nonconvex stochastic optimization: Handling constraints, high dimensionality, and saddle points,” Foundations of Computational Mathematics, vol. 22, no. 1, p. 35–76, Feb 2022.
  14. A. S. Berahas, L. Cao, K. Choromanski, and K. Scheinberg, “A theoretical and empirical comparison of gradient approximations in derivative-free optimization,” Foundations of Computational Mathematics, vol. 22, no. 2, pp. 507–560, 2022.
  15. J. Geiping, H. Bauermeister, H. Dröge, and M. Moeller, “Inverting gradients-how easy is it to break privacy in federated learning?” Advances in Neural Information Processing Systems, vol. 33, pp. 16 937–16 947, 2020.
  16. R. Bollapragada, R. H. Byrd, and J. Nocedal, “Exact and inexact subsampled newton methods for optimization,” IMA Journal of Numerical Analysis, vol. 39, no. 2, p. 545–578, Apr 2019.
  17. J. Blackard, “Covertype,” UCI Machine Learning Repository, 1998, DOI: https://doi.org/10.24432/C50K5N.
Citations (6)

Summary

We haven't generated a summary for this paper yet.