Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method (2401.17460v2)
Abstract: Cross-device federated learning (FL) is a growing machine learning setting whereby multiple edge devices collaborate to train a model without disclosing their raw data. With the great number of mobile devices participating in more FL applications via the wireless environment, the practical implementation of these applications will be hindered due to the limited uplink capacity of devices, causing critical bottlenecks. In this work, we propose a novel doubly communication-efficient zero-order (ZO) method with a one-point gradient estimator that replaces communicating long vectors with scalar values and that harnesses the nature of the wireless communication channel, overcoming the need to know the channel state coefficient. It is the first method that includes the wireless channel in the learning algorithm itself instead of wasting resources to analyze it and remove its impact. We then offer a thorough analysis of the proposed zero-order federated learning (ZOFL) framework and prove that our method converges \textit{almost surely}, which is a novel result in nonconvex ZO optimization. We further prove a convergence rate of $O(\frac{1}{\sqrt[3]{K}})$ in the nonconvex setting. We finally demonstrate the potential of our algorithm with experimental results.
- J. C. Duchi, M. I. Jordan, M. J. Wainwright, and A. Wibisono, “Optimal rates for zero-order convex optimization: The power of two function evaluations,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2788–2806, 2015.
- A. Agarwal, O. Dekel, and L. Xiao, “Optimal algorithms for online convex optimization with multi-point bandit feedback,” in COLT, 2010.
- A. Flaxman, A. T. Kalai, and H. B. McMahan, “Online convex optimization in the bandit setting: gradient descent without a gradient,” CoRR, vol. cs.LG/0408007, 2004. [Online]. Available: http://arxiv.org/abs/cs.LG/0408007
- W. Li and M. Assaad, “Distributed stochastic optimization in networks with low informational exchange,” IEEE Transactions on Information Theory, vol. 67, no. 5, pp. 2989–3008, 2021.
- E. Mhanna and M. Assaad, “Zero-order one-point estimate with distributed stochastic gradient-tracking technique,” 2022. [Online]. Available: https://arxiv.org/abs/2210.05618
- A. Vemula, W. Sun, and J. Bagnell, “Contrasting exploration in parameter and action space: A zeroth-order optimization perspective,” in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, vol. 89. PMLR, 16–18 Apr 2019, pp. 2926–2935.
- D. Malik, A. Pananjady, K. Bhatia, K. Khamaru, P. Bartlett, and M. Wainwright, “Derivative-free methods for policy optimization: Guarantees for linear quadratic systems,” in Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, vol. 89. PMLR, 16–18 Apr 2019, pp. 2916–2925.
- A. Dhurandhar, T. Pedapati, A. Balakrishnan, P.-Y. Chen, K. Shanmugam, and R. Puri, “Model agnostic contrastive explanations for structured data,” 2019.
- A. Ilyas, L. Engstrom, A. Athalye, and J. Lin, “Black-box adversarial attacks with limited queries and information,” in Proceedings of the 35th International Conference on Machine Learning, vol. 80. PMLR, 10–15 Jul 2018, pp. 2137–2146.
- X. Chen, S. Liu, K. Xu, X. Li, X. Lin, M. Hong, and D. Cox, “Zo-adamm: Zeroth-order adaptive momentum method for black-box optimization,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019.
- K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečný, S. Mazzocchi, B. McMahan, T. Van Overveldt, D. Petrou, D. Ramage, and J. Roselander, “Towards federated learning at scale: System design,” in Proceedings of Machine Learning and Systems, vol. 1, 2019, pp. 374–388.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54. PMLR, 20–22 Apr 2017, pp. 1273–1282.
- X. Zhang, M. Hong, S. Dhople, W. Yin, and Y. Liu, “Fedpd: A federated learning framework with adaptivity to non-iid data,” IEEE Transactions on Signal Processing, vol. 69, pp. 6055–6070, 2021.
- J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “A novel framework for the analysis and design of heterogeneous federated learning,” IEEE Transactions on Signal Processing, vol. 69, pp. 5234–5249, 2021.
- A. Elgabli, C. B. Issaid, A. S. Bedi, K. Rajawat, M. Bennis, and V. Aggarwal, “FedNew: A communication-efficient and privacy-preserving Newton-type method for federated learning,” in Proceedings of the 39th International Conference on Machine Learning, vol. 162. PMLR, 17–23 Jul 2022, pp. 5861–5877.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smithy, “Feddane: A federated newton-type method,” in 2019 53rd Asilomar Conference on Signals, Systems, and Computers, 2019, pp. 1227–1231.
- T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
- K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Transactions on Wireless Communications, vol. 19, no. 3, pp. 2022–2035, 2020.
- M. M. Amiri and D. Gündüz, “Federated learning over wireless fading channels,” IEEE Transactions on Wireless Communications, vol. 19, no. 5, pp. 3546–3557, 2020.
- T. Sery and K. Cohen, “On analog gradient descent learning over multiple access fading channels,” IEEE Transactions on Signal Processing, vol. 68, pp. 2897–2911, 2020.
- H. Guo, A. Liu, and V. K. N. Lau, “Analog gradient aggregation for federated learning over wireless networks: Customized design and convergence analysis,” IEEE Internet of Things Journal, vol. 8, no. 1, pp. 197–210, 2021.
- T. Sery, N. Shlezinger, K. Cohen, and Y. C. Eldar, “Over-the-air federated learning from heterogeneous data,” IEEE Transactions on Signal Processing, vol. 69, pp. 3796–3811, 2021.
- Y. Sun, S. Zhou, Z. Niu, and D. Gündüz, “Dynamic scheduling for over-the-air federated edge learning with energy constraints,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 1, pp. 227–242, 2022.
- E. Björnson and L. Sanguinetti, “Making cell-free massive mimo competitive with mmse processing and centralized implementation,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 77–90, 2020.
- A. Khaled, K. Mishchenko, and P. Richtarik, “Tighter theory for local sgd on identical and heterogeneous data,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, vol. 108. PMLR, 26–28 Aug 2020, pp. 4519–4529.
- T. Chen, G. Giannakis, T. Sun, and W. Yin, “Lag: Lazily aggregated gradient for communication-efficient distributed learning,” in Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/paper/2018/file/feecee9f1643651799ede2740927317a-Paper.pdf
- M. M. Amiri, D. Gündüz, S. R. Kulkarni, and H. V. Poor, “Convergence of update aware device scheduling for federated learning at the wireless edge,” IEEE Transactions on Wireless Communications, vol. 20, no. 6, pp. 3643–3658, 2021.
- J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” 2016. [Online]. Available: https://arxiv.org/abs/1610.05492
- S. Khirirat, H. R. Feyzmahdavian, and M. Johansson, “Distributed learning with compressed gradients,” 2018. [Online]. Available: https://arxiv.org/abs/1806.06573
- A. Elgabli, J. Park, A. S. Bedi, M. Bennis, and V. Aggarwal, “Q-gadmm: Quantized group admm for communication efficient decentralized machine learning,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 8876–8880.
- K. Mishchenko, E. Gorbunov, M. Takáč, and P. Richtárik, “Distributed learning with compressed gradient differences,” 2019. [Online]. Available: https://arxiv.org/abs/1901.09269
- Y. Chen, R. S. Blum, M. Takáč, and B. M. Sadler, “Distributed learning with sparsified gradient differences,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 3, pp. 585–600, 2022.
- W. Fang, Z. Yu, Y. Jiang, Y. Shi, C. N. Jones, and Y. Zhou, “Communication-efficient stochastic zeroth-order optimization for federated learning,” IEEE Transactions on Signal Processing, vol. 70, pp. 5058–5073, 2022.
- Z. Dai, B. K. H. Low, and P. Jaillet, “Federated bayesian optimization via thompson sampling,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 9687–9699.
- Y. LeCun and C. Cortes, “The MNIST database of handwritten digits,” 2005.
- J. L. Doob, “Stochastic processes,” 1953.
- Elissa Mhanna (5 papers)
- Mohamad Assaad (68 papers)