Distributed Pruning Towards Tiny Neural Networks in Federated Learning (2212.01977v2)
Abstract: Neural network pruning is an essential technique for reducing the size and complexity of deep neural networks, enabling large-scale models on devices with limited resources. However, existing pruning approaches heavily rely on training data for guiding the pruning strategies, making them ineffective for federated learning over distributed and confidential datasets. Additionally, the memory- and computation-intensive pruning process becomes infeasible for recourse-constrained devices in federated learning. To address these challenges, we propose FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. We introduce two key modules in FedTiny to adaptively search coarse- and finer-pruned specialized models to fit deployment scenarios with sparse and cheap local computation. First, an adaptive batch normalization selection module is designed to mitigate biases in pruning caused by the heterogeneity of local data. Second, a lightweight progressive pruning module aims to finer prune the models under strict memory and computational budgets, allowing the pruning policy for each layer to be gradually determined rather than evaluating the overall model structure. The experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art approaches, particularly when compressing deep models to extremely sparse tiny models. FedTiny achieves an accuracy improvement of 2.61% while significantly reducing the computational cost by 95.91% and the memory footprint by 94.01% compared to state-of-the-art methods.
- S. A. Janowsky, “Pruning versus clipping in neural networks,” Physical Review A, vol. 39, no. 12, p. 6600, 1989.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
- C. Louizos, M. Welling, and D. P. Kingma, “Learning sparse neural networks through l_0𝑙_0l\_0italic_l _ 0 regularization,” in International Conference on Learning Representations, 2018.
- R. Yu, A. Li, C.-F. Chen, J.-H. Lai, V. I. Morariu, X. Han, M. Gao, C.-Y. Lin, and L. S. Davis, “Nisp: Pruning networks using neuron importance score propagation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9194–9203, 2018.
- P. Molchanov, A. Mallya, S. Tyree, I. Frosio, and J. Kautz, “Importance estimation for neural network pruning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11264–11272, 2019.
- S. P. Singh and D. Alistarh, “Woodfisher: Efficient second-order approximation for neural network compression,” Advances in Neural Information Processing Systems, vol. 33, pp. 18098–18109, 2020.
- T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE Signal Processing Magazine, vol. 37, no. 3, pp. 50–60, 2020.
- W. Xu, W. Fang, Y. Ding, M. Zou, and N. Xiong, “Accelerating federated learning for iot in big data analytics with pruning, quantization and selective updating,” IEEE Access, vol. 9, pp. 38457–38466, 2021.
- R. Shao, H. Liu, and D. Liu, “Privacy preserving stochastic channel-based federated learning with neural network pruning,” arXiv preprint arXiv:1910.02115, 2019.
- A. Li, J. Sun, B. Wang, L. Duan, S. Li, Y. Chen, and H. Li, “Lotteryfl: Empower edge intelligence with personalized and communication-efficient federated learning,” in 2021 IEEE/ACM Symposium on Edge Computing (SEC), pp. 68–79, IEEE, 2021.
- M. T. Munir, M. M. Saeed, M. Ali, Z. A. Qazi, and I. A. Qazi, “Fedprune: Towards inclusive federated learning,” arXiv preprint arXiv:2110.14205, 2021.
- S. Liu, G. Yu, R. Yin, and J. Yuan, “Adaptive network pruning for wireless federated learning,” IEEE Wireless Communications Letters, vol. 10, no. 7, pp. 1572–1576, 2021.
- Y. Jiang, S. Wang, V. Valls, B. J. Ko, W.-H. Lee, K. K. Leung, and L. Tassiulas, “Model pruning enables efficient federated learning on edge devices,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- U. Evci, T. Gale, J. Menick, P. S. Castro, and E. Elsen, “Rigging the lottery: Making all tickets winners,” in International Conference on Machine Learning, pp. 2943–2952, PMLR, 2020.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp. 448–456, PMLR, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- T. Hoefler, D. Alistarh, T. Ben-Nun, N. Dryden, and A. Peste, “Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks,” Journal of Machine Learning Research, vol. 22, no. 241, pp. 1–124, 2021.
- M. C. Mozer and P. Smolensky, “Skeletonization: A technique for trimming the fat from a network via relevance assessment,” Advances in neural information processing systems, vol. 1, 1988.
- Y. LeCun, J. Denker, and S. Solla, “Optimal brain damage,” Advances in neural information processing systems, vol. 2, 1989.
- P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient inference,” in 5th International Conference on Learning Representations, ICLR 2017-Conference Track Proceedings, 2019.
- N. Lee, T. Ajanthan, and P. Torr, “Snip: Single-shot network pruning based on connection sensitivity,” in International Conference on Learning Representations, 2018.
- C. Wang, G. Zhang, and R. Grosse, “Picking winning tickets before training by preserving gradient flow,” in International Conference on Learning Representations, 2019.
- H. Tanaka, D. Kunin, D. L. Yamins, and S. Ganguli, “Pruning neural networks without any data by iteratively conserving synaptic flow,” Advances in Neural Information Processing Systems, vol. 33, pp. 6377–6389, 2020.
- D. C. Mocanu, E. Mocanu, P. Stone, P. H. Nguyen, M. Gibescu, and A. Liotta, “Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science,” Nature communications, vol. 9, no. 1, pp. 1–12, 2018.
- T. Dettmers and L. Zettlemoyer, “Sparse networks from scratch: Faster training without losing performance,” arXiv preprint arXiv:1907.04840, 2019.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, pp. 1273–1282, PMLR, 2017.
- X. Qiu, J. Fernandez-Marques, P. P. Gusmao, Y. Gao, T. Parcollet, and N. D. Lane, “Zerofl: Efficient on-device training for federated learning with local sparsity,” arXiv preprint arXiv:2208.02507, 2022.
- S. Bibikar, H. Vikalo, Z. Wang, and X. Chen, “Federated dynamic sparse training: Computing less, communicating less, yet learning better,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 6080–6088, 2022.
- Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-iid data,” arXiv preprint arXiv:1806.00582, 2018.
- O. Marfoq, C. Xu, G. Neglia, and R. Vidal, “Throughput-optimal topology design for cross-silo federated learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 19478–19487, 2020.
- J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” Advances in neural information processing systems, vol. 33, pp. 7611–7623, 2020.
- M. Duan, D. Liu, X. Chen, Y. Tan, J. Ren, L. Qiao, and L. Liang, “Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications,” in 2019 IEEE 37th international conference on computer design (ICCD), pp. 246–254, IEEE, 2019.
- Z. Li, Y. He, H. Yu, J. Kang, X. Li, Z. Xu, and D. Niyato, “Data heterogeneity-robust federated learning via group client selection in industrial iot,” IEEE Internet of Things Journal, 2022.
- W. Zhang, X. Wang, P. Zhou, W. Wu, and X. Zhang, “Client selection for federated learning with non-iid data in mobile edge computing,” IEEE Access, vol. 9, pp. 24462–24474, 2021.
- F. Chen, M. Luo, Z. Dong, Z. Li, and X. He, “Federated meta-learning with fast convergence and efficient communication,” arXiv preprint arXiv:1802.07876, 2018.
- V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federated multi-task learning,” Advances in neural information processing systems, vol. 30, 2017.
- T. Lin, L. Kong, S. U. Stich, and M. Jaggi, “Ensemble distillation for robust model fusion in federated learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 2351–2363, 2020.
- Y. He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European conference on computer vision (ECCV), pp. 784–800, 2018.
- B. Li, B. Wu, J. Su, and G. Wang, “Eagleeye: Fast sub-net evaluation for efficient neural network pruning,” in European conference on computer vision, pp. 639–654, Springer, 2020.
- A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” 2009.
- L. N. Darlow, E. J. Crowley, A. Antoniou, and A. J. Storkey, “Cinic-10 is not imagenet or cifar-10,” arXiv preprint arXiv:1810.03505, 2018.
- Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” 2011.
- M. Luo, F. Chen, D. Hu, Y. Zhang, J. Liang, and J. Feng, “No fear of heterogeneity: Classifier calibration for federated learning with non-iid data,” Advances in Neural Information Processing Systems, vol. 34, pp. 5972–5984, 2021.
- C. He, S. Li, J. So, X. Zeng, M. Zhang, H. Wang, X. Wang, P. Vepakomma, A. Singh, H. Qiu, et al., “Fedml: A research library and benchmark for federated machine learning,” arXiv preprint arXiv:2007.13518, 2020.
- Hong Huang (56 papers)
- Lan Zhang (108 papers)
- Chaoyue Sun (1 paper)
- Ruogu Fang (25 papers)
- Xiaoyong Yuan (23 papers)
- Dapeng Wu (52 papers)