Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

Bayesian Federated Model Compression for Communication and Computation Efficiency (2404.07532v1)

Published 11 Apr 2024 in cs.LG, cs.AI, and cs.DC

Abstract: In this paper, we investigate Bayesian model compression in federated learning (FL) to construct sparse models that can achieve both communication and computation efficiencies. We propose a decentralized Turbo variational Bayesian inference (D-Turbo-VBI) FL framework where we firstly propose a hierarchical sparse prior to promote a clustered sparse structure in the weight matrix. Then, by carefully integrating message passing and VBI with a decentralized turbo framework, we propose the D-Turbo-VBI algorithm which can (i) reduce both upstream and downstream communication overhead during federated training, and (ii) reduce the computational complexity during local inference. Additionally, we establish the convergence property for thr proposed D-Turbo-VBI algorithm. Simulation results show the significant gain of our proposed algorithm over the baselines in reducing communication overhead during federated training and computational complexity of final model.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. X. Zhang, Y. Li, W. Li, K. Guo, and Y. Shao, “Personalized federated learning via variational Bayesian inference,” in International Conference on Machine Learning.   PMLR, 2022, pp. 26 293–26 310.
  2. R. Chor, M. Sefidgaran, and A. Zaidi, “More communication does not result in smaller generalization error in federated learning,” in 2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 48–53.
  3. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.   PMLR, 2017, pp. 1273–1282.
  4. O. T. Odeyomi, “Differentially private online federated learning with personalization and fairness,” in 2023 IEEE International Symposium on Information Theory (ISIT), 2023, pp. 1955–1960.
  5. S. Ji, W. Jiang, A. Walid, and X. Li, “Dynamic sampling and selective masking for communication-efficient federated learning,” IEEE Intelligent Systems, vol. 37, no. 2, pp. 27–34, 2022.
  6. C. Louizos, M. Reisser, J. Soriaga, and M. Welling, “Federated averaging as expectation maximization,” 2021. [Online]. Available: https://openreview.net/forum?id=eoQBpdMy81m
  7. A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, and R. Pedarsani, “Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2020, pp. 2021–2031.
  8. J. Xu, W. Du, Y. Jin, W. He, and R. Cheng, “Ternary compression for communication-efficient federated learning,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
  9. S. M. Shah and V. K. Lau, “Model compression for communication efficient federated learning,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
  10. T. Zhuang, Z. Zhang, Y. Huang, X. Zeng, K. Shuang, and X. Li, “Neuron-level structured pruning using polarization regularizer,” Advances in neural information processing systems, vol. 33, pp. 9865–9877, 2020.
  11. L. Yang, Z. He, and D. Fan, “Harmonious coexistence of structured weight pruning and ternarization for deep neural networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. 6623–6630.
  12. M. Shen, P. Molchanov, H. Yin, and J. M. Alvarez, “When to prune? a policy towards early structural pruning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 247–12 256.
  13. C. Louizos, K. Ullrich, and M. Welling, “Bayesian compression for deep learning,” Advances in neural information processing systems, vol. 30, 2017.
  14. M. Van Baalen, C. Louizos, M. Nagel, R. A. Amjad, Y. Wang, T. Blankevoort, and M. Welling, “Bayesian bits: Unifying quantization and pruning,” Advances in neural information processing systems, vol. 33, pp. 5741–5752, 2020.
  15. C. Xia, D. H. K. Tsang, and V. K. N. Lau, “Structured Bayesian compression for deep neural networks based on the turbo-vbi approach,” IEEE Transactions on Signal Processing, 2023.
  16. S. J. Kwon, D. Lee, B. Kim, P. Kapoor, B. Park, and G.-Y. Wei, “Structured compression by weight encryption for unstructured pruning and quantization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1909–1918.
  17. P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
  18. J. O. Neill, “An overview of neural network compression,” arXiv preprint arXiv:2006.03669, 2020.
  19. S. Scardapane, D. Comminiello, A. Hussain, and A. Uncini, “Group sparse regularization for deep neural networks,” Neurocomputing, vol. 241, pp. 81–89, 2017.
  20. J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke, “Scalpel: Customizing dnn pruning to the underlying hardware parallelism,” ACM SIGARCH Computer Architecture News, vol. 45, no. 2, pp. 548–560, 2017.
  21. L. Liu, F. Zheng, H. Chen, G.-J. Qi, H. Huang, and L. Shao, “A Bayesian federated learning framework with online laplace approximation,” arXiv e-prints, pp. arXiv–2102, 2021.
  22. L. V. Jospin, W. Buntine, F. Boussaid, H. Laga, and M. Bennamoun, “Hands-on Bayesian neural networks–a tutorial for deep learning users,” arXiv preprint arXiv:2007.06823, 2020.
  23. A. Liu, G. Liu, L. Lian, V. K. Lau, and M.-J. Zhao, “Robust recovery of structured sparse signals with uncertain sensing matrix: A turbo-vbi approach,” IEEE Transactions on Wireless Communications, vol. 19, no. 5, pp. 3185–3198, 2020.
  24. D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximation for Bayesian inference,” IEEE Signal Processing Magazine, vol. 25, no. 6, pp. 131–146, 2008.
  25. P. Tseng, “Convergence of a block coordinate descent method for nondifferentiable minimization,” Journal of optimization theory and applications, vol. 109, no. 3, pp. 475–494, 2001.
  26. H. Yu, S. Yang, and S. Zhu, “Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 5693–5700.
  27. Y. Xu and W. Yin, “Block stochastic gradient iteration for convex and nonconvex optimization,” SIAM Journal on Optimization, vol. 25, no. 3, pp. 1686–1716, 2015.
  28. T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,” arXiv preprint arXiv:1909.06335, 2019.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com