When Foresight Pruning Meets Zeroth-Order Optimization: Efficient Federated Learning for Low-Memory Devices (2405.04765v1)
Abstract: Although Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design, it fails to work on low-memory AIoT devices due to its heavy memory usage. To address this problem, various federated pruning methods are proposed to reduce memory usage during inference. However, few of them can substantially mitigate the memory burdens during pruning and training. As an alternative, zeroth-order or backpropagation-free (BP-Free) methods can partially alleviate the memory consumption, but they suffer from scaling up and large computation overheads, since the gradient estimation error and floating point operations (FLOPs) increase as the dimensionality of the model parameters grows. In this paper, we propose a federated foresight pruning method based on Neural Tangent Kernel (NTK), which can seamlessly integrate with federated BP-Free training frameworks. We present an approximation to the computation of federated NTK by using the local NTK matrices. Moreover, we demonstrate that the data-free property of our method can substantially reduce the approximation error in extreme data heterogeneity scenarios. Since our approach improves the performance of the vanilla BP-Free method with fewer FLOPs and truly alleviates memory pressure during training and inference, it makes FL more friendly to low-memory devices. Comprehensive experimental results obtained from simulation- and real test-bed-based platforms show that our federated foresight-pruning method not only preserves the ability of the dense model with a memory reduction up to 9x but also boosts the performance of the vanilla BP-Free method with dramatically fewer FLOPs.
- TensorFlow: A system for large-scale machine learning. In Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265–283.
- Sharp bounds on the rate of convergence of the empirical covariance matrix. Comptes Rendus. Mathématique 349, 3-4 (2011), 195–200.
- Federated dynamic sparse training: Computing less, communicating less, yet learning better. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 6080–6088.
- HADFL: Heterogeneity-aware Decentralized Federated Learning Framework. In Proceedings of the Design Automation Conference (DAC). 1–6.
- FHDnn: communication efficient and robust federated learning for AIoT networks. In Proceedings of the Design Automation Conference (DAC). 37–42.
- Optimizing Training Efficiency and Cost of Hierarchical Federated Learning in Heterogeneous Mobile-Edge Cloud Computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 42, 5 (2023), 1518–1531.
- Fast sparse convnets. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). 14629–14638.
- Does Federated Learning Really Need Backpropagation? arXiv preprint arXiv:2301.12195 (2023).
- Shape Matters: Understanding the Implicit Bias of the Noise Covariance. In Proceedings of Annual Conference Computational Learning Theory (COLT). 2315–2357.
- Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.
- Fl-ntk: A neural tangent kernel-based framework for federated learning analysis. In Proceedings of International Conference on Machine Learning (ICML). 4423–4434.
- Distributed Pruning Towards Tiny Neural Networks in Federated Learning. In Proceedings of the International Conference on Distributed Computing Systems (ICDCS). 190–201.
- Neural Tangent Kernel: Convergence and Generalization in Neural Networks. In Proceedings of Advances in Neural Information Processing Systems (NeurIPS). 8580–8589.
- Model Pruning Enables Efficient Federated Learning on Edge Devices. IEEE Transactions on Neural Networks and Learning Systems (TNNLS) 34, 12 (2023), 10374–10386.
- Federated Learning for Internet of Things: Recent Advances, Taxonomy, and Open Challenges. IEEE Communications Surveys and Tutorials 23, 3 (2021), 1759–1799.
- A Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Master’s thesis, University of Tront (2009).
- Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
- LotteryFL: Empower Edge Intelligence with Personalized and Communication-Efficient Federated Learning. In Proceedings of IEEE/ACM Symposium on Edge Computing (SEC). 68–79.
- On generalization error bounds of noisy gradient methods for non-convex learning. In Proceedings of International Conference on Learning Representations (ICLR).
- Federated learning on non-iid data silos: An experimental study. In Proceedings of the International Conference on Data Engineering (ICDE). 965–978.
- Yipeng Li and Xinchen Lyu. 2023. Convergence Analysis of Sequential Federated Learning on Heterogeneous Data. In Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS).
- Zan Li and Li Chen. 2021. Communication-efficient decentralized zeroth-order method on heterogeneous data. In Proceedings of the International Conference on Wireless Communications and Signal Processing (WCSP). 1–6.
- Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461 (2021), 370–403.
- Adaptive Channel Sparsity for Federated Learning Under System Heterogeneity. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). 20432–20441.
- Enhance local consistency in federated learning: A multi-step inertial momentum approach. arXiv preprint arXiv:2302.05726 (2023).
- Communication-efficient learning of deep networks from decentralized data. In Proceedings of Artificial intelligence and statistics (AISTATS). 1273–1282.
- Communication-efficient learning of deep networks from decentralized data. In Proceedings of Artificial Intelligence and Statistics (AISTATS). 1273–1282.
- Local learning matters: Rethinking data heterogeneity in federated learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR). 8397–8406.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS). 8024–8035.
- ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity. In Proceedings of the International Conference on Learning Representations (ICLR).
- Edge-moe: Memory-efficient multi-task vision transformer architecture with task-level sparsity via mixture-of-experts. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). 1–9.
- NASI: Label- and Data-agnostic Neural Architecture Search at Initialization. In Proceedings of the International Conference on Learning Representations (ICLR).
- Zeroth-Order Optimization with Trajectory-Informed Derivative Estimation. In Proceedings of the International Conference on Learning Representations (ICLR).
- Federated Learning with Heterogeneous Models for On-device Malware Detection in IoT Networks. In Proceedings of Design, Automation & Test in Europe Conference & Exhibition (DATE). 1–6.
- Charles M. Stein. 1981. Estimation of the Mean of a Multivariate Normal Distribution. The Annals of Statistics 9, 6 (1981), 1135–1151.
- NTK-SAP: Improving neural network pruning by aligning training dynamics. In Proceedings of the International Conference on Learning Representations (ICLR).
- FedComp: A Federated Learning Compression Framework for Resource-Constrained Edge Computing Devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 43, 1 (2024), 230–243.
- PervasiveFL: Pervasive Federated Learning for Heterogeneous IoT Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 41, 11 (2022), 4100–4111.
- Zhao Yang and Qingshuang Sun. 2022. Personalized Heterogeneity-Aware Federated Search Towards Better Accuracy and Energy Efficiency. In Proceedings of the International Conference on Computer-Aided Design (ICCAD). 59:1–59:9.
- Zhao Yang and Qingshuang Sun. 2023. Mitigating Heterogeneities in Federated Edge Learning with Resource-independence Aggregation. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE). 1–2.
- Zeroth-Order Algorithms for Stochastic Distributed Nonconvex Optimization. Automatica 142 (2022), 110353.
- Jing Zhang and Dacheng Tao. 2021. Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things. IEEE Internet of Things Journal 8, 10 (2021), 7789–7817.
- Resilient and Communication Efficient Learning for Heterogeneous Federated Systems. In Proceedings of the International Conference on Machine Learning (ICML). 27504–27526.
- Pengyu Zhang (26 papers)
- Yingjie Liu (30 papers)
- Yingbo Zhou (81 papers)
- Xiao Du (11 papers)
- Xian Wei (48 papers)
- Ting Wang (213 papers)
- Mingsong Chen (53 papers)