Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Communication-Efficient Federated Learning via Regularized Sparse Random Networks (2309.10834v2)

Published 19 Sep 2023 in cs.LG, cs.CV, cs.DC, and cs.DS

Abstract: This work presents a new method for enhancing communication efficiency in stochastic Federated Learning that trains over-parameterized random networks. In this setting, a binary mask is optimized instead of the model weights, which are kept fixed. The mask characterizes a sparse sub-network that is able to generalize as good as a smaller target network. Importantly, sparse binary masks are exchanged rather than the floating point weights in traditional federated learning, reducing communication cost to at most 1 bit per parameter (Bpp). We show that previous state of the art stochastic methods fail to find sparse networks that can reduce the communication and storage overhead using consistent loss objectives. To address this, we propose adding a regularization term to local objectives that acts as a proxy of the transmitted masks entropy, therefore encouraging sparser solutions by eliminating redundant features across sub-networks. Extensive empirical experiments demonstrate significant improvements in communication and memory efficiency of up to five magnitudes compared to the literature, with minimal performance degradation in validation accuracy in some instances

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” arXiv preprint arXiv:1602.05629, 2017.
  2. J. Sun, T. Chen, G. Giannakis, and Z. Yang, “Communication-efficient distributed learning via lazily aggregated quantized gradients,” in Advances in Neural Information Processing Systems, vol. 32.   Curran Associates, Inc., 2019. [Online]. Available: http://arxiv.org/abs/1909.07588
  3. D. Alistarh, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: randomized quantization for communication-optimal stochastic gradient descent,” CoRR, vol. abs/1610.02132, 2016. [Online]. Available: http://arxiv.org/abs/1610.02132
  4. V. Ramanujan, M. Wortsman, A. Kembhavi, A. Farhadi, and M. Rastegari, “What’s hidden in a randomly weighted neural network?” CoRR, vol. abs/1911.13299, 2019. [Online]. Available: http://arxiv.org/abs/1911.13299
  5. A. Pensia, S. Rajput, A. Nagle, H. Vishwakarma, and D. S. Papailiopoulos, “Optimal lottery tickets via subsetsum: Logarithmic over-parameterization is sufficient,” CoRR, vol. abs/2006.07990, 2020. [Online]. Available: https://arxiv.org/abs/2006.07990
  6. G. S. Lueker, “Exponentially small bounds on the expected optimum of the partition and subset sum problems,” Random Structures & Algorithms, vol. 12, no. 1, pp. 51–62, 1998. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291098-2418%28199801%2912%3A1%3C51%3A%3AAID-RSA3%3E3.0.CO%3B2-S
  7. A. Li, J. Sun, X. Zeng, M. Zhang, H. Li, and Y. Chen, “Fedmask: Joint computation and communication-efficient personalized federated learning via heterogeneous masking,” in Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, ser. SenSys ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 42–55. [Online]. Available: https://doi.org/10.1145/3485730.3485929
  8. B. Isik, F. Pase, D. Gunduz, T. Weissman, and M. Zorzi, “Sparse random networks for communication-efficient federated learning,” 2023. [Online]. Available: https://arxiv.org/abs/2209.15328
  9. H. Zhou, J. Lan, R. Liu, and J. Yosinski, “Deconstructing lottery tickets: Zeros, signs, and the supermask,” CoRR, vol. abs/1905.01067, 2019. [Online]. Available: http://arxiv.org/abs/1905.01067
  10. A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-100 (canadian institute for advanced research).” [Online]. Available: http://www.cs.toronto.edu/~kriz/cifar.html
  11. L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
  12. J. Bernstein, Y. Wang, K. Azizzadenesheli, and A. Anandkumar, “signsgd: compressed optimisation for non-convex problems,” CoRR, vol. abs/1802.04434, 2018. [Online]. Available: http://arxiv.org/abs/1802.04434

Summary

We haven't generated a summary for this paper yet.