Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data Heterogeneity (2312.13380v1)
Abstract: Motivated by high resource costs of centralized machine learning schemes as well as data privacy concerns, federated learning (FL) emerged as an efficient alternative that relies on aggregating locally trained models rather than collecting clients' potentially private data. In practice, available resources and data distributions vary from one client to another, creating an inherent system heterogeneity that leads to deterioration of the performance of conventional FL algorithms. In this work, we present a federated quantization-based self-supervised learning scheme (Fed-QSSL) designed to address heterogeneity in FL systems. At clients' side, to tackle data heterogeneity we leverage distributed self-supervised learning while utilizing low-bit quantization to satisfy constraints imposed by local infrastructure and limited communication resources. At server's side, Fed-QSSL deploys de-quantization, weighted aggregation and re-quantization, ultimately creating models personalized to both data distribution as well as specific infrastructure of each client's device. We validated the proposed algorithm on real world datasets, demonstrating its efficacy, and theoretically analyzed impact of low-bit training on the convergence and robustness of the learned models.
- Federated dynamic sparse training: Computing less, communicating less, yet learning better. In Proceedings of the AAAI Conference on Artificial Intelligence (2022), vol. 36, pp. 6080–6088.
- Synergistic self-supervised and quantization learning. In European Conference on Computer Vision (2022), Springer, pp. 587–604.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning (2020), PMLR, pp. 1597–1607.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021), pp. 15750–15758.
- Communication-efficient variance-reduced decentralized stochastic optimization over time-varying directed graphs. IEEE Transactions on Automatic Control 67, 12 (2021), 6583–6594.
- Decentralized optimization on time-varying directed graphs under communication constraints. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), IEEE, pp. 3670–3674.
- Communication-efficient federated deep learning with layerwise asynchronous model update and temporally weighted aggregation. IEEE transactions on neural networks and learning systems 31, 10 (2019), 4229–4238.
- Stochastic model-based minimization of weakly convex functions. SIAM Journal on Optimization 29, 1 (2019), 207–239.
- Heterofl: Computation and communication efficient federated learning for heterogeneous clients. arXiv preprint arXiv:2010.01264 (2020).
- An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems 33 (2020), 19586–19597.
- Ssfl: Tackling label deficiency in federated learning via personalized self-supervision. arXiv preprint arXiv:2110.02470 (2021).
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.
- Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869–6898.
- An information-theoretic justification for model pruning. In International Conference on Artificial Intelligence and Statistics (2022), PMLR, pp. 3821–3846.
- Federated learning algorithm based on knowledge distillation. In 2020 International Conference on Artificial Intelligence and Computer Engineering (ICAICE) (2020), IEEE, pp. 163–167.
- Improving federated learning personalization via model agnostic meta learning. arXiv preprint arXiv:1909.12488 (2019).
- How to escape saddle points efficiently. In International conference on machine learning (2017), PMLR, pp. 1724–1732.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14, 1–2 (2021), 1–210.
- Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning (2020), PMLR, pp. 5132–5143.
- Learning multiple layers of features from tiny images.
- Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE) (2022), IEEE, pp. 965–978.
- Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems 2 (2020), 429–450.
- On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189 (2019).
- Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems 33 (2020), 2351–2363.
- Self-supervised learning is more robust to dataset imbalance. arXiv preprint arXiv:2110.05025 (2021).
- Federated self-supervised learning for heterogeneous clients. arXiv preprint arXiv:2205.12493 (2022).
- Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics (2017), pp. 1273–1282.
- On the almost sure convergence of stochastic gradient descent in non-convex problems. Advances in Neural Information Processing Systems 33 (2020), 1117–1128.
- Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision (2016), Springer, pp. 525–542.
- Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. In International Conference on Artificial Intelligence and Statistics (2020), PMLR, pp. 2021–2031.
- Unsupervised federated learning for unbalanced data. In GLOBECOM 2020-2020 IEEE Global Communications Conference (2020), IEEE, pp. 1–6.
- Scalar quantization for relative error. In 2011 Data Compression Conference (2011), IEEE, pp. 293–302.
- Vershynin, R. High-dimensional probability: An introduction with applications in data science, vol. 47. Cambridge university press, 2018.
- Does learning from decentralized non-iid unlabeled data benefit from self supervision? In The Eleventh International Conference on Learning Representations (2022).
- Terngrad: Ternary gradients to reduce communication in distributed deep learning. Advances in neural information processing systems 30 (2017).
- Bitwidth heterogeneous federated learning with progressive weight dequantization. In International Conference on Machine Learning (2022), PMLR, pp. 25552–25565.
- Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning. In International Conference on Machine Learning (2017), PMLR, pp. 4035–4043.
- Fixed-point back-propagation training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020), pp. 2330–2338.
- Federated learning with non-iid data. arXiv preprint arXiv:1806.00582 (2018).
- Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016).
- Towards unified int8 training for convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1969–1979.
- Divergence-aware federated self-supervised learning. arXiv preprint arXiv:2204.04385 (2022).