Federated Learning of Large Models at the Edge via Principal Sub-Model Training (2208.13141v3)
Abstract: Federated Learning (FL) is emerging as a popular, promising decentralized learning framework that enables collaborative training among clients, with no need to share private data between them or to a centralized server. However, considering many edge clients do not have sufficient computing, memory, or communication capabilities, federated learning of large models still faces significant bottlenecks. To keep such weak but crucial clients in the loop, prior works either consider a heterogeneous-client setting where clients train models with different sizes; or offload training to the server. However, the heterogeneous-client setting requires some clients to train full model, which is not aligned with the resource-constrained setting; while the latter ones break privacy promises in FL when sharing intermediate representations or labels with the server. To overcome these limitations, in this work, we formulate a realistic, but much less explored, cross-device FL setting in which no client can train a full large model nor is willing to share any intermediate information with the remote server. Under such a formulation, we develop a principal sub-model (PriSM) training methodology to collaboratively train a full large model, while assigning each client a small sub-model that is a probabilistic low-rank approximation to the full server model. When creating sub-models, PriSM first performs a principal kernel analysis in the orthogonal kernel space to obtain importance of each kernel. Then, PriSM adopts a novel importance-aware sampling process to select a subset of kernels (i.e., a kernel with high importance is assigned with a higher sampling probability). This sampling process ensures each sub-model is still a low-rank approximation to the full model, while all sub-models together achieve nearly full coverage on the principal kernels.
- Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18):10101–10106, 2000.
- Siloed federated learning for multi-centric histopathology datasets. In Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning, pp. 129–139. Springer, Lima, Peru, 2020.
- Randall Balestriero et al. A spline theory of deep learning. In International Conference on Machine Learning, pp. 374–383. PMLR, 2018.
- Linear algebra and matrix analysis for statistics, volume 181. Crc Press Boca Raton, 2014.
- Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
- High performance convolutional neural networks for document processing. In Tenth international workshop on frontiers in handwriting recognition. Suvisoft, 2006.
- Adasplit: Adaptive trade-offs for resource-constrained distributed deep learning. arXiv preprint arXiv:2112.01637, 2021.
- Exploiting linear structure within convolutional networks for efficient evaluation. Advances in neural information processing systems, 27, 2014.
- HeteroFL: Computation and communication efficient federated learning for heterogeneous clients. International Conference on Learning Representations, 2021.
- Basil: A fast and byzantine-resilient approach for decentralized training. IEEE Journal on Selected Areas in Communications, 40(9):2694–2716, 2022.
- Unsplit: Data-oblivious model inversion, model stealing, and label inference attacks against split learning. In Proceedings of the 21st Workshop on Privacy in the Electronic Society, pp. 115–124, 2022.
- Group knowledge transfer: Federated learning of large CNNs at the edge. Advances in Neural Information Processing Systems, 33:14068–14080, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Efficient split-mix federated learning for on-demand and in-situ customization. In International Conference on Learning Representations, 2022.
- FjORD: Fair and accurate federated learning under heterogeneous targets with ordered dropout. Advances in Neural Information Processing Systems, 34, 2021.
- Fedpara: Low-rank hadamard product for communication-efficient federated learning. arXiv preprint arXiv:2108.06098, 2021.
- Training CNNs with low-rank filters for efficient image classification. International Conference on Learning Representations (ICLR), 2016.
- Speeding up convolutional neural networks with low rank expansions. British Machine Vision Conference (BMVC), 2014.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- Initialization and regularization of factorized neural layers. In International Conference on Learning Representations, 2020.
- Initialization and regularization of factorized neural layers. International Conference on Learning Representations (ICLR), 2021.
- Learning multiple layers of features from tiny images. Citeseer, 2009.
- Fedmask: Joint computation and communication-efficient personalized federated learning via heterogeneous masking. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pp. 42–55, 2021.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pp. 142–150, 2011.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- A 200mhz 202.4 gflops@ 10.8 w vgg16 accelerator in xilinx vx690t. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 784–788. IEEE, 2017.
- Resource-adaptive federated learning with all-in-one neural composition. In Advances in Neural Information Processing Systems, 2022.
- Sensitivity-based acceleration and compression algorithm for convolution neural network. In 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 769–773. IEEE, 2017.
- 3legrace: Privacy-preserving dnn training over tees and gpus. Proceedings on Privacy Enhancing Technologies, 4:183–203, 2022.
- Tensorizing neural networks. Advances in neural information processing systems, 28, 2015.
- Sundar Pichai. Google’s Sundar Pichai: Privacy Should Not Be a Luxury Good. In New York Times, 2019.
- Split learning for collaborative deep learning in healthcare. arXiv preprint arXiv:1912.12115, 2019.
- Coded computing for low-latency federated learning over wireless edge networks. IEEE Journal on Selected Areas in Communications, 39(1):233–250, 2020.
- Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
- Splitfed: When federated learning meets split learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 8485–8493, 2022.
- Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp. 10347–10357. PMLR, 2021.
- Split learning for health: Distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564, 2018.
- Cuttlefish: Low-rank model training without all the tuning. Proceedings of Machine Learning and Systems, 5, 2023.
- Orthogonal convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11505–11515, 2020.
- All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185, 2017.
- FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization. arXiv preprint arXiv:2111.14655, 2021.
- Memory-adaptive depth-wise heterogenous federated learning. arXiv preprint arXiv:2303.04887, 2023.
- The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 253–261, 2020.
- Yue Niu (27 papers)
- Saurav Prakash (23 papers)
- Souvik Kundu (76 papers)
- Sunwoo Lee (32 papers)
- Salman Avestimehr (116 papers)