Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Federated Learning Over the Air (2403.06528v1)

Published 11 Mar 2024 in cs.LG, cs.IT, cs.NI, and math.IT

Abstract: We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. This approach capitalizes on the inherent superposition property of wireless channels, facilitating fast and scalable parameter aggregation. Meanwhile, it enhances the robustness of the model training process by dynamically adjusting the stepsize in accordance with the global gradient update. We derive the convergence rate of the training algorithms, encompassing the effects of channel fading and interference, for a broad spectrum of nonconvex loss functions. Our analysis shows that the AdaGrad-based algorithm converges to a stationary point at the rate of $\mathcal{O}( \ln{(T)} /{ T{ 1 - \frac{1}{\alpha} } } )$, where $\alpha$ represents the tail index of the electromagnetic interference. This result indicates that the level of heavy-tailedness in interference distribution plays a crucial role in the training efficiency: the heavier the tail, the slower the algorithm converges. In contrast, an Adam-like algorithm converges at the $\mathcal{O}( 1/T )$ rate, demonstrating its advantage in expediting the model training process. We conduct extensive experiments that corroborate our theoretical findings and affirm the practical efficacy of our proposed federated adaptive gradient methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. C. Wang, Z. Chen, H. H. Yang, and N. Pappas, “Adaptive gradient methods for over-the-air federated learning,” in Proc. IEEE Workshop Signal Process. Adv. Wirel. Commun. (SPAWC), Shanghai, China, Sep. 2023, pp. 351–355.
  2. H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Proc. Int. Conf. Artif. Intell. Stat. (AISTATS), Fort Lauderdale, FL, USA, Apr. 2017, pp. 1273–1282.
  3. T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, May. 2020.
  4. Z. Zhao, C. Feng, H. H. Yang, and X. Luo, “Federated learning-enabled intelligent fog-radio access networks: Fundamental theory, key techniques, and future trends,” IEEE Wireless Commun. Mag., vol. 27, no. 2, pp. 22–28, Apr. 2020.
  5. S. Niknam, H. S. Dhillon, and J. H. Reed, “Federated learning for wireless communications: Motivation, opportunities, and challenges,” IEEE Commun. Mag., vol. 58, no. 6, pp. 46–51, 2020.
  6. H. H. Yang, Z. Liu, T. Q. S. Quek, and H. V. Poor, “Scheduling policies for federated learning in wireless networks,” IEEE Trans. Commun., vol. 68, no. 1, pp. 317–333, Jan. 2020.
  7. M. M. Amiri and D. Gündüz, “Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air,” IEEE Trans. Signal Process., vol. 68, pp. 2155–2169, Mar. 2020.
  8. H. Guo, Y. Zhu, H. Ma, V. K. N. Lau, K. Huang, X. Li, H. Nong, and M. Zhou, “Over-the-air aggregation for federated learning: Waveform superposition and prototype validation,” J. of Commun. and Inf. Networks, vol. 6, no. 4, pp. 429–442, Dec. 2021.
  9. Z. Chen, H. H. Yang, and T. Q. S. Quek, “Edge intelligence over the air: Two faces of interference in federated learning,” IEEE Commun. Mag., vol. 61, no. 12, pp. 62–68, Dec. 2023.
  10. A. Şahin and R. Yang, “A survey on over-the-air computation,” IEEE Commun. Surv. Tutor., vol. 25, no. 3, pp. 1877–1908, Q3. 2023.
  11. G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 491–506, Jan. 2020.
  12. Z. Zhang, G. Zhu, R. Wang, V. K. N. Lau, and K. Huang, “Turning channel noise into an accelerator for over-the-air principal component analysis,” IEEE Trans. Wireless Commun., vol. 21, no. 10, pp. 7926–7941, Oct. 2022.
  13. H. H. Yang, Z. Chen, T. Q. S. Quek, and H. V. Poor, “Revisiting analog over-the-air machine learning: The blessing and curse of interference,” IEEE J. Sel. Topics Signal Process., vol. 16, no. 3, pp. 406–419, Apr. 2022.
  14. G. Zhu, J. Xu, K. Huang, and S. Cui, “Over-the-air computing for wireless data aggregation in massive IoT,” IEEE Wireless Commun., vol. 28, no. 4, pp. 57–65, Aug. 2021.
  15. T. Sery and K. Cohen, “On analog gradient descent learning over multiple access fading channels,” IEEE Trans. Signal Process., vol. 68, pp. 2897–2911, Apr. 2020.
  16. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, no. 61, pp. 2121–2159, Jul. 2011.
  17. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diega, CA, USA, Dec. 2015.
  18. S. J. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konečnỳ, S. Kumar, and H. B. McMahan, “Adaptive federated optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), Virtual, Apr. 2020.
  19. R. Ward, X. Wu, and L. Bottou, “Adagrad stepsizes: Sharp convergence over nonconvex landscapes,” J. Mach. Learn. Res., vol. 21, no. 1, pp. 9047–9076, 2020.
  20. S. Mehta, C. Paunwala, and B. Vaidya, “CNN based traffic sign classification using adam optimizer,” in Proc. Int. Conf. Intell. Comput. Control Syst. (ICCS), Algarve, Portugal, May. 2019, pp. 1293–1298.
  21. L. Clavier, T. Pedersen, I. Larrad, M. Lauridsen, and M. Egan, “Experimental evidence for heavy tailed interference in the iot,” IEEE Commun. Lett., vol. 25, no. 3, pp. 692–695, Mar. 2021.
  22. D. Middleton, “Statistical-physical models of electromagnetic interference,” IEEE Trans. Electromagn. Compat., vol. EMC-19, no. 3, pp. 106–127, Aug. 1977.
  23. M. Z. Win, P. C. Pinto, and L. A. Shepp, “A mathematical theory of network interference and its applications,” Proc. IEEE, vol. 97, no. 2, pp. 205–230, Feb. 2009.
  24. B. Nazer and M. Gastpar, “Computation over multiple-access channels,” IEEE Trans. Inf. Theory, vol. 53, no. 10, pp. 3498–3516, Oct. 2007.
  25. M. Goldenbaum, H. Boche, and S. Stańczak, “Harnessing interference for analog function computation in wireless sensor networks,” IEEE Trans. Signal Process., vol. 61, no. 20, pp. 4893–4906, Oct. 2013.
  26. Z. Chen, Z. Li, H. H. Yang, and T. Q. S. Quek, “Personalizing federated learning with over-the-air computations,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Island of Rhodes, Greek, Jun. 2023, pp. 1–5.
  27. A. Şahin, “Over-the-air computation based on balanced number systems for federated edge learning,” IEEE Trans. Wireless Commun., 2023 Early Access.
  28. H. H. Yang, Z. Chen, and T. Q. S. Quek, “Unleashing edgeless federated learning with analog transmissions,” IEEE Trans. Signal Process., vol. 72, pp. 774–791, Jan. 2024.
  29. K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, Mar. 2020.
  30. G. Zhu and K. Huang, “Mimo over-the-air computation for high-mobility multimodal sensing,” IEEE Internet Things J., vol. 6, no. 4, pp. 6089–6103, Aug. 2019.
  31. L. Chen, N. Zhao, Y. Chen, F. R. Yu, and G. Wei, “Over-the-air computation for iot networks: Computing multiple functions with antenna arrays,” IEEE Internet Things J., vol. 5, no. 6, pp. 5296–5306, Dec. 2018.
  32. M. Kim, A. L. Swindlehurst, and D. Park, “Beamforming vector design and device selection in over-the-air federated learning,” IEEE Trans. Wireless Commun., vol. 22, no. 11, pp. 7464–7477, Nov. 2023.
  33. N. Zhang and M. Tao, “Gradient statistics aware power control for over-the-air federated learning,” IEEE Trans. Wireless Commun., vol. 20, no. 8, pp. 5115–5128, Aug. 2021.
  34. X. Cao, G. Zhu, J. Xu, Z. Wang, and S. Cui, “Optimized power control design for over-the-air federated edge learning,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 342–358, Jan. 2022.
  35. H. Yang, P. Qiu, J. Liu, and A. Yener, “Over-the-air federated learning with joint adaptive computation and power control,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland, Jun. 2022, pp. 1259–1264.
  36. M. Krouka, A. Elgabli, C. B. Issaid, and M. Bennis, “Communication-efficient federated learning: A second order newton-type method with analog over-the-air aggregation,” IEEE Trans. Green Commun. Netw., vol. 6, no. 3, pp. 1862–1874, Sep. 2022.
  37. P. Yang, Y. Jiang, T. Wang, Y. Zhou, Y. Shi, and C. N. Jones, “Over-the-air federated learning via second-order optimization,” IEEE Trans. Wireless Commun., vol. 21, no. 12, pp. 10 560–10 575, Dec. 2022.
  38. X. Li, “RSS-based location estimation with unknown pathloss model,” IEEE Trans. Wireless Commun., vol. 5, no. 12, pp. 3626–3633, Dec. 2006.
  39. Y. Shao, D. Gündüz, and S. C. Liew, “Federated edge learning with misaligned over-the-air computation,” IEEE Trans. Wireless Commun., vol. 21, no. 6, pp. 3951–3964, Jun. 2022.
  40. H. Hellström, S. Razavikia, V. Fodor, and C. Fischione, “Optimal receive filter design for misaligned over-the-air computation,” Available as ArXiv:2009.02181, 2023.
  41. M. Mohammadi, A. Mohammadpour, and H. Ogata, “On estimating the tail index and the spectral measure of multivariate \⁢α\𝛼\textbackslash\alpha\ italic_α α𝛼\alphaitalic_α-stable distributions,” Metrika, vol. 78, no. 5, pp. 549–561, Jul. 2015.
  42. S. Xia, J. Zhu, Y. Yang, Y. Zhou, Y. Shi, and W. Chen, “Fast convergence algorithm for analog federated learning,” in Proc. IEEE Int. Conf. Commun. (ICC), Montreal, Canada (Virtual), Jun. 2021, pp. 1–6.
  43. K. Xu, H. H. Yang, Z. Zhao, W. Hong, T. Q. S. Quek, and M. Peng, “Pruning analog over-the-air distributed learning models with accuracy loss guarantee,” in Proc. IEEE Int. Conf. Commun. (ICC), Seoul, Korea, Aug. 2022.
  44. R. Ward, X. Wu, and L. Bottou, “AdaGrad stepsizes: Sharp convergence over nonconvex landscapes,” in Proc. Int. Conf. Mach. Learn. (ICML).   Long Beach, California, USA: PMLR, Jun 2019, pp. 6677–6686.
  45. A. Défossez, L. Bottou, F. Bach, and N. Usunier, “A simple convergence proof of adam and adagrad,” Trans. Mach. Learn. Res., vol. 16, no. 3, pp. 406–419, Oct. 2022.
  46. A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” University of Toronto, Toronto, ON, Tech. Rep., 2009.
  47. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
  48. G. Cohen, S. Afshar, J. Tapson, and A. van Schaik, “EMNIST: Extending mnist to handwritten letters,” in Proc. Int. Jt. Conf. Neural Netw. (IJCNN), 2017, pp. 2921–2926.
  49. Z. Xiao, Z. Chen, S. Liu, H. Wang, Y. Feng, J. Hao, J. T. Zhou, J. Wu, H. H. Yang, and Z. Liu, “Fed-grab: Federated long-tailed learning with self-adjusting gradient balancer,” in Adv. Neural Inf. Process. Syst. (NeurIPS), New Orleans, USA, Dec. 2023.
  50. C. Feng, H. H. Yang, D. Hu, Z. Zhao, T. Q. S. Quek, and G. Min, “Mobility-aware cluster federated learning in hierarchical wireless networks,” IEEE Trans. Wireless Commun., vol. 21, no. 10, pp. 8441–8458, Oct. 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.