Communication-Efficient Federated Learning via Regularized Sparse Random Networks (2309.10834v2)
Abstract: This work presents a new method for enhancing communication efficiency in stochastic Federated Learning that trains over-parameterized random networks. In this setting, a binary mask is optimized instead of the model weights, which are kept fixed. The mask characterizes a sparse sub-network that is able to generalize as good as a smaller target network. Importantly, sparse binary masks are exchanged rather than the floating point weights in traditional federated learning, reducing communication cost to at most 1 bit per parameter (Bpp). We show that previous state of the art stochastic methods fail to find sparse networks that can reduce the communication and storage overhead using consistent loss objectives. To address this, we propose adding a regularization term to local objectives that acts as a proxy of the transmitted masks entropy, therefore encouraging sparser solutions by eliminating redundant features across sub-networks. Extensive empirical experiments demonstrate significant improvements in communication and memory efficiency of up to five magnitudes compared to the literature, with minimal performance degradation in validation accuracy in some instances
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” arXiv preprint arXiv:1602.05629, 2017.
- J. Sun, T. Chen, G. Giannakis, and Z. Yang, “Communication-efficient distributed learning via lazily aggregated quantized gradients,” in Advances in Neural Information Processing Systems, vol. 32. Curran Associates, Inc., 2019. [Online]. Available: http://arxiv.org/abs/1909.07588
- D. Alistarh, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: randomized quantization for communication-optimal stochastic gradient descent,” CoRR, vol. abs/1610.02132, 2016. [Online]. Available: http://arxiv.org/abs/1610.02132
- V. Ramanujan, M. Wortsman, A. Kembhavi, A. Farhadi, and M. Rastegari, “What’s hidden in a randomly weighted neural network?” CoRR, vol. abs/1911.13299, 2019. [Online]. Available: http://arxiv.org/abs/1911.13299
- A. Pensia, S. Rajput, A. Nagle, H. Vishwakarma, and D. S. Papailiopoulos, “Optimal lottery tickets via subsetsum: Logarithmic over-parameterization is sufficient,” CoRR, vol. abs/2006.07990, 2020. [Online]. Available: https://arxiv.org/abs/2006.07990
- G. S. Lueker, “Exponentially small bounds on the expected optimum of the partition and subset sum problems,” Random Structures & Algorithms, vol. 12, no. 1, pp. 51–62, 1998. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291098-2418%28199801%2912%3A1%3C51%3A%3AAID-RSA3%3E3.0.CO%3B2-S
- A. Li, J. Sun, X. Zeng, M. Zhang, H. Li, and Y. Chen, “Fedmask: Joint computation and communication-efficient personalized federated learning via heterogeneous masking,” in Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, ser. SenSys ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 42–55. [Online]. Available: https://doi.org/10.1145/3485730.3485929
- B. Isik, F. Pase, D. Gunduz, T. Weissman, and M. Zorzi, “Sparse random networks for communication-efficient federated learning,” 2023. [Online]. Available: https://arxiv.org/abs/2209.15328
- H. Zhou, J. Lan, R. Liu, and J. Yosinski, “Deconstructing lottery tickets: Zeros, signs, and the supermask,” CoRR, vol. abs/1905.01067, 2019. [Online]. Available: http://arxiv.org/abs/1905.01067
- A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-100 (canadian institute for advanced research).” [Online]. Available: http://www.cs.toronto.edu/~kriz/cifar.html
- L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
- J. Bernstein, Y. Wang, K. Azizzadenesheli, and A. Anandkumar, “signsgd: compressed optimisation for non-convex problems,” CoRR, vol. abs/1802.04434, 2018. [Online]. Available: http://arxiv.org/abs/1802.04434