Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning (2308.11464v2)
Abstract: Federated learning (FL) inevitably confronts the challenge of system heterogeneity in practical scenarios. To enhance the capabilities of most model-homogeneous FL methods in handling system heterogeneity, we propose a training scheme that can extend their capabilities to cope with this challenge. In this paper, we commence our study with a detailed exploration of homogeneous and heterogeneous FL settings and discover three key observations: (1) a positive correlation between client performance and layer similarities, (2) higher similarities in the shallow layers in contrast to the deep layers, and (3) the smoother gradients distributions indicate the higher layer similarities. Building upon these observations, we propose InCo Aggregation that leverages internal cross-layer gradients, a mixture of gradients from shallow and deep layers within a server model, to augment the similarity in the deep layers without requiring additional communication between clients. Furthermore, our methods can be tailored to accommodate model-homogeneous FL methods such as FedAvg, FedProx, FedNova, Scaffold, and MOON, to expand their capabilities to handle the system heterogeneity. Copious experimental results validate the effectiveness of InCo Aggregation, spotlighting internal cross-layer gradients as a promising avenue to enhance the performance in heterogeneous FL.
- FedRolex: Model-heterogeneous federated learning with rolling sub-model extraction. Advances in Neural Information Processing Systems, 35:29677–29690, 2022.
- Sergio A. Alvarez. Gaussian rbf centered kernel alignment (cka) in the large-bandwidth limit. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6587–6593, 2023. doi: 10.1109/TPAMI.2022.3216518.
- Joint superposition coding and training for federated learning over multi-width neural networks. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pp. 1729–1738. IEEE, 2022.
- Duality in vector optimization. Springer Science & Business Media, 2009.
- Convex optimization. Cambridge university press, 2004.
- Expanding the reach of federated learning by reducing client resource requirements. arXiv preprint arXiv:1812.07210, 2018.
- Fedhe: Heterogeneous models and communication-efficient federated learning. IEEE International Confer- ence on Mobility, Sensing and Networking (MSN 2021), 2021.
- Yun-Hin Chan and Edith C-H Ngai. Exploiting features and logits in heterogeneous federated learning. arXiv preprint arXiv:2210.15527, 2022.
- Communication-efficient federated learning with adaptive parameter freezing. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), pp. 1–11. IEEE, 2021.
- Data-free learning of student networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3514–3522, 2019.
- Algorithms for learning kernels based on centered alignment. The Journal of Machine Learning Research, 13(1):795–828, 2012.
- Cinic-10 is not imagenet or cifar-10. arXiv preprint arXiv:1810.03505, 2018.
- HeteroFL: Computation and communication efficient federated learning for heterogeneous clients. In International Conference on Learning Representations, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Robust federated learning with noisy and heterogeneous clients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10072–10081, 2022.
- A survey on heterogeneous federated learning, 2022. URL https://arxiv.org/abs/2210.04505.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129(6):1789–1819, March 2021. ISSN 1573-1405. doi: 10.1007/s11263-021-01453-z. URL http://dx.doi.org/10.1007/s11263-021-01453-z.
- Group knowledge transfer: Federated learning of large cnns at the edge. Advances in Neural Information Processing Systems, 33:14068–14080, 2020.
- Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, pp. 630–645. Springer, 2016.
- Distilling the knowledge in a neural network. NIPS Deep Learning and Representation Learning Workshop, 2015.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Fjord: Fair and accurate federated learning under heterogeneous targets with ordered dropout. Advances in Neural Information Processing Systems, 34:12876–12889, 2021.
- Learn from others and be yourself in heterogeneous federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10143–10153, 2022.
- Evaluating gradient inversion attacks and defenses in federated learning. Advances in Neural Information Processing Systems, 34:7232–7241, 2021.
- Scalefl: Resource-adaptive federated learning with heterogeneous clients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24532–24541, 2023.
- Factorized-fl: Personalized federated learning with parameter factorization & similarity matching. In Advances in Neural Information Processing Systems, 2022.
- Advances and open problems in federated learning. 2021.
- Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, pp. 5132–5143. PMLR, 2020.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Similarity of neural network representations revisited. In International Conference on Machine Learning, pp. 3519–3529. PMLR, 2019.
- Learning multiple layers of features from tiny images. 2009.
- FedMD: Heterogenous federated learning via model distillation. NeurIPS Workshop on Federated Learning for Data Privacy and Confidentiality, 2019.
- Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722, 2021a.
- Federated learning on non-iid data silos: An experimental study. In 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 965–978. IEEE, 2022.
- Federated learning: Challenges, methods, and future directions. IEEE signal processing magazine, 37(3):50–60, 2020a.
- Federated optimization in heterogeneous networks. Proceedings of the 3rd MLSys Conference, 2020b.
- Convergent learning: Do different neural networks learn the same representations? arXiv preprint arXiv:1511.07543, 2015.
- FedH2L: Federated learning with model and statistical heterogeneity. arXiv preprint arXiv:2101.11296, 2021b.
- Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
- No one left behind: Inclusive federated learning over heterogeneous devices. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3398–3406, 2022.
- No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34:5972–5984, 2021.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Agnostic federated learning. In International Conference on Machine Learning, pp. 4615–4625. PMLR, 2019.
- Insights on representational similarity in neural networks with canonical correlation. Advances in neural information processing systems, 31, 2018.
- Zero-shot knowledge distillation in deep networks. In International Conference on Machine Learning, pp. 4743–4751. PMLR, 2019.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability. Advances in neural information processing systems, 30, 2017.
- Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems, 34:12116–12128, 2021.
- Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- Fedaux: Leveraging unlabeled auxiliary data in federated learning. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Federated mutual learning. arXiv preprint arXiv:2006.16765, 2020.
- Fedproto: Federated prototype learning across heterogeneous clients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 8432–8440, 2022.
- Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems, 33:7611–7623, 2020.
- Towards understanding learning representations: To what extent do different neural networks learn the same representation. Advances in neural information processing systems, 31, 2018.
- Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms, 2017.
- Asynchronous federated optimization. 12th Annual Workshop on Optimization for Machine Learning, 2020.
- On layer normalization in the transformer architecture. In International Conference on Machine Learning, pp. 10524–10533. PMLR, 2020.
- Universally slimmable networks and improved training techniques. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1803–1811, 2019.
- Radio2text: Streaming speech recognition using mmwave radio signals. 7(3), sep 2023. doi: 10.1145/3610873. URL https://doi.org/10.1145/3610873.
- Federated learning with non-iid data. arXiv preprint arXiv:1806.00582, 2018.
- Data-free knowledge distillation for heterogeneous federated learning. In International Conference on Machine Learning, pp. 12878–12889. PMLR, 2021.