Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Wireless Federated Learning via Low-Rank Gradient Factorization (2401.07496v2)

Published 15 Jan 2024 in cs.IT, cs.LG, eess.SP, and math.IT

Abstract: This paper presents a novel gradient compression method for federated learning (FL) in wireless systems. The proposed method centers on a low-rank matrix factorization strategy for local gradient compression that is based on one iteration of a distributed Jacobi successive convex approximation (SCA) at each FL round. The low-rank approximation obtained at one round is used as a "warm start" initialization for Jacobi SCA in the next FL round. A new protocol termed over-the-air low-rank compression (Ota-LC) incorporating this gradient compression method with over-the-air computation and error feedback is shown to have lower computation cost and lower communication overhead, while guaranteeing the same inference performance, as compared with existing benchmarks. As an example, when targeting a test accuracy of 70% on the Cifar-10 dataset, Ota-LC reduces total communication costs by at least 33% compared to benchmark schemes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang et al., “Large scale distributed deep networks,” in Proc. Adv. Neural Info. Proc. Syst. (NIPS), (Lake Tahoe, USA), Dec. 2012.
  2. D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient sgd via gradient quantization and encoding,” in Proc. Adv. Neural Info. Proc. Syst. (NIPS), (Long Beach, USA), Dec. 2017.
  3. J. Bernstein, Y.-X. Wang, K. Azizzadenesheli, and A. Anandkumar, “SignSGD: Compressed optimisation for non-convex problems,” in Proc. Int. Conf. Mach. Learn., (Stockholm, Sweden), Jul. 2018.
  4. D. Alistarh, T. Hoefler, M. Johansson, N. Konstantinov, S. Khirirat, and C. Renggli, “The convergence of sparsified gradient methods,” in Proc. Adv. Neural Info. Proc. Syst. (NIPS), (Montréal, Canada), Dec. 2018.
  5. S. U. Stich, J.-B. Cordonnier, and M. Jaggi, “Sparsified SGD with memory,” in Proc. Adv. Neural Info. Proc. Syst. (NIPS), (Montréal, Canada), Dec. 2018.
  6. A. Sahu, A. Dutta, A. M Abdelmoniem, T. Banerjee, M. Canini, and P. Kalnis, “Rethinking gradient sparsification as total error minimization,” in Proc. Adv. Neural Info. Proc. Syst. (NIPS), (Virtual), Dec. 2021.
  7. C. Zhong and X. Yuan, “Over-the-air federated learning over MIMO channels: A sparse-coded multiplexing approach,” [Online]. Available: https://arxiv.org/pdf/2304.04402.pdf, 2023.
  8. E. Becirovic, Z. Chen, and E. G. Larsson, “Optimal MIMO combining for blind federated edge learning with gradient sparsification,” in Proc. 2022 IEEE 23rd Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), (Oilu, Finland), Jul. 2022.
  9. Y.-S. Jeon, M. M. Amiri, J. Li, and H. V. Poor, “A compressive sensing approach for federated learning over massive MIMO communication systems,” IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1990–2004, 2020.
  10. A. Abdi, Y. M. Saidutta, and F. Fekri, “Analog compression and communication for federated learning over wireless mac,” in Proc. 2020 IEEE 21st Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), (Virtual), May. 2020.
  11. T. Vogels, S. P. Karimireddy, and M. Jaggi, “PowerSGD: Practical low-rank gradient compression for distributed optimization,” in Proc. Adv. Neural Info. Proc. Syst. (NIPS), (Vancouver, Canada), Dec. 2019.
  12. A. V. Makkuva, M. Bondaschi, T. Vogels, M. Jaggi, H. Kim, and M. C. Gastpar, “Laser: Linear compression in wireless distributed optimization,” [Online]. Available: https://arxiv.org/pdf/2310.13033.pdf, 2023.
  13. H. Xing, O. Simeone, and S. Bi, “Federated learning over wireless device-to-device networks: Algorithms and convergence analysis,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3723–3741, 2021.
  14. J. Ma, X. Yuan, and L. Ping, “Turbo compressed sensing with partial dft sensing matrix,” IEEE Signal Process. Lett., vol. 22, no. 2, pp. 158–161, 2015.
  15. K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, 2020.
  16. G. K. Dziugaite and D. M. Roy, “Neural network matrix factorization,” [Online]. Available: https://arxiv.org/pdf/1511.06443.pdf, 2015.
  17. S. S. Azam, S. Hosseinalipour, Q. Qiu, and C. Brinton, “Recycling model updates in federated learning: Are gradient subspaces low-rank?” in Proc. Int. Conf. Learn. Represent., (Virtual), May. 2021.
  18. A. F. Aji and K. Heafield, “Sparse communication for distributed gradient descent,” [Online]. Available: https://arxiv.org/pdf/1704.05021.pdf, 2017.
  19. G. Scutari, F. Facchinei, P. Song, D. P. Palomar, and J.-S. Pang, “Decomposition by partial linearization: Parallel optimization of multi-agent systems,” IEEE Trans. Signal Process., vol. 62, no. 3, pp. 641–656, 2014.
  20. Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, vol. 42, no. 8, pp. 30–37, 2009.
  21. T. Hastie, R. Mazumder, J. D. Lee, and R. Zadeh, “Matrix completion and low-rank svd via fast alternating least squares,” J. Mach. Learn. Res., vol. 16, no. 1, pp. 3367–3402, 2015.
  22. P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proc. of the forty-fifth annual ACM symposium on Theory of computing, (Palo Alto, USA), Jun. 2013.
  23. G. Zhu and K. Huang, “MIMO over-the-air computation for high-mobility multimodal sensing,” IEEE Internet Things J., vol. 6, no. 4, pp. 6089–6103, 2018.
  24. S. P. Karimireddy, Q. Rebjock, S. Stich, and M. Jaggi, “Error feedback fixes signsgd and other gradient compression schemes,” in Proc. Int. Conf. Mach. Learn., (Long Beach, USA), Jun. 2019.
  25. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  26. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Info. Proc. Syst. (NIPS), (Lake Tahoe, USA), Dec. 2012.

Summary

We haven't generated a summary for this paper yet.