Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning (2210.01708v5)
Abstract: Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. In the typical FL paradigm (e.g., FedAvg), model weights are sent to and from the server each round to participating clients. Recently, the use of small pre-trained models has been shown to be effective in federated learning optimization and improving convergence. However, recent state-of-the-art pre-trained models are getting more capable but also have more parameters, known as the "Foundation Models." In conventional FL, sharing the enormous model weights can quickly put a massive communication burden on the system, especially if more capable models are employed. Can we find a solution to enable those strong and readily available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden? To this end, we investigate the use of parameter-efficient fine-tuning in federated learning and thus introduce a new framework: FedPEFT. Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings. By only locally tuning and globally sharing a small portion of the model weights, significant reductions in the total communication overhead can be achieved while maintaining competitive or even better performance in a wide range of federated learning scenarios, providing insight into a new paradigm for practical and effective federated systems.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated Optimization in Heterogeneous Networks,” Apr. 2020, arXiv:1812.06127 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1812.06127
- S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” in International Conference on Machine Learning. PMLR, 2020, pp. 5132–5143.
- M. Mendieta, T. Yang, P. Wang, M. Lee, Z. Ding, and C. Chen, “Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning,” arXiv:2111.14213 [cs], Mar. 2022, arXiv: 2111.14213. [Online]. Available: http://arxiv.org/abs/2111.14213
- D. A. E. Acar, Y. Zhao, R. M. Navarro, M. Mattina, P. N. Whatmough, and V. Saligrama, “Federated learning based on dynamic regularization,” arXiv preprint arXiv:2111.04263, 2021.
- J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” Advances in neural information processing systems, vol. 33, pp. 7611–7623, 2020.
- J. Nguyen, K. Malik, M. Sanjabi, and M. Rabbat, “Where to Begin? Exploring the Impact of Pre-Training and Initialization in Federated Learning,” Jun. 2022, arXiv:2206.15387 [cs]. [Online]. Available: http://arxiv.org/abs/2206.15387
- H.-Y. Chen, C.-H. Tu, Z. Li, H.-W. Shen, and W.-L. Chao, “On Pre-Training for Federated Learning,” Jun. 2022, arXiv:2206.11488 [cs]. [Online]. Available: http://arxiv.org/abs/2206.11488
- P. Kairouz et al., “Advances and Open Problems in Federated Learning,” arXiv:1912.04977 [cs, stat], Mar. 2021, arXiv: 1912.04977. [Online]. Available: http://arxiv.org/abs/1912.04977
- H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” Feb. 2017, arXiv:1602.05629 [cs]. [Online]. Available: http://arxiv.org/abs/1602.05629
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 2021, arXiv:2010.11929 [cs]. [Online]. Available: http://arxiv.org/abs/2010.11929
- OpenAI, “Gpt-4 technical report,” 2023.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” University of Toronto, 2012.
- G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” in 2017 international joint conference on neural networks (IJCNN). IEEE, 2017, pp. 2921–2926.
- R. Bommasani et al., “On the Opportunities and Risks of Foundation Models,” Jul. 2022, arXiv:2108.07258 [cs]. [Online]. Available: http://arxiv.org/abs/2108.07258
- H. Cai, C. Gan, L. Zhu, and S. Han, “TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning,” in Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., 2020, pp. 11 285–11 297. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/81f7acabd411274fcf65ce2070ed568a-Abstract.html
- M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual Prompt Tuning,” Jul. 2022, arXiv:2203.12119 [cs]. [Online]. Available: http://arxiv.org/abs/2203.12119
- J. Pfeiffer, A. Rücklé, C. Poth, A. Kamath, I. Vulić, S. Ruder, K. Cho, and I. Gurevych, “AdapterHub: A Framework for Adapting Transformers,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics, 2020, pp. 46–54. [Online]. Available: https://aclanthology.org/2020.emnlp-demos.7
- H. Chen, R. Tao, H. Zhang, Y. Wang, W. Ye, J. Wang, G. Hu, and M. Savvides, “Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets,” Aug. 2022, arXiv:2208.07463 [cs]. [Online]. Available: http://arxiv.org/abs/2208.07463
- Q. Li, B. He, and D. Song, “Model-Contrastive Federated Learning,” Mar. 2021, arXiv:2103.16257 [cs]. [Online]. Available: http://arxiv.org/abs/2103.16257
- J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization,” Jul. 2020, arXiv:2007.07481 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2007.07481
- M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, T. N. Hoang, and Y. Khazaeni, “Bayesian Nonparametric Federated Learning of Neural Networks,” May 2019, arXiv:1905.12022 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1905.12022
- J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated Learning: Strategies for Improving Communication Efficiency,” Oct. 2017, arXiv:1610.05492 [cs]. [Online]. Available: http://arxiv.org/abs/1610.05492
- A. T. Suresh, F. X. Yu, S. Kumar, and H. B. McMahan, “Distributed Mean Estimation with Limited Communication,” Sep. 2017, arXiv:1611.00429 [cs]. [Online]. Available: http://arxiv.org/abs/1611.00429
- J. Hamer, M. Mohri, and A. T. Suresh, “FedBoost: A Communication-Efficient Algorithm for Federated Learning,” in Proceedings of the 37th International Conference on Machine Learning. PMLR, Nov. 2020, pp. 3973–3983, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v119/hamer20a.html
- J. Pan, Z. Lin, X. Zhu, J. Shao, and H. Li, “Parameter-Efficient Image-to-Video Transfer Learning,” Jun. 2022, arXiv:2206.13559 [cs]. [Online]. Available: http://arxiv.org/abs/2206.13559
- H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. Raffel, “Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning,” Aug. 2022, arXiv:2205.05638 [cs]. [Online]. Available: http://arxiv.org/abs/2205.05638
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
- X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks,” Mar. 2022, arXiv:2110.07602 [cs]. [Online]. Available: http://arxiv.org/abs/2110.07602
- E. B. Zaken, S. Ravfogel, and Y. Goldberg, “Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199, 2021.
- J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” arXiv preprint arXiv:2110.04366, 2021.
- J. Pfeiffer, A. Kamath, A. Rücklé, K. Cho, and I. Gurevych, “AdapterFusion: Non-Destructive Task Composition for Transfer Learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: Association for Computational Linguistics, 2021, pp. 487–503. [Online]. Available: https://aclanthology.org/2021.eacl-main.39
- X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” 2021.
- H. Bahng, A. Jahanian, S. Sankaranarayanan, and P. Isola, “Exploring Visual Prompts for Adapting Large-Scale Models,” Jun. 2022, arXiv:2203.17274 [cs]. [Online]. Available: http://arxiv.org/abs/2203.17274
- T. Yang, Y. Zhu, Y. Xie, A. Zhang, C. Chen, and M. Li, “Aim: Adapting image models for efficient video action recognition,” arXiv preprint arXiv:2302.03024, 2023.
- Y. Yao, A. Zhang, Z. Zhang, Z. Liu, T.-S. Chua, and M. Sun, “CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models,” May 2022, arXiv:2109.11797 [cs]. [Online]. Available: http://arxiv.org/abs/2109.11797
- S. Chen, C. Ge, Z. Tong, J. Wang, Y. Song, J. Wang, and P. Luo, “Adaptformer: Adapting vision transformers for scalable visual recognition,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 664–16 678, 2022.
- S. Jie and Z.-H. Deng, “Convolutional bypasses are better vision transformer adapters,” arXiv preprint arXiv:2207.07039, 2022.
- J. Pan, Z. Lin, X. Zhu, J. Shao, and H. Li, “St-adapter: Parameter-efficient image-to-video transfer learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 26 462–26 477, 2022.
- Q. Gao, C. Zhao, Y. Sun, T. Xi, G. Zhang, B. Ghanem, and J. Zhang, “A unified continual learning framework with general parameter-efficient tuning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 483–11 493.
- Z. Zhang, Y. Yang, Y. Dai, Q. Wang, Y. Yu, L. Qu, and Z. Xu, “Fedpetuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models,” in Annual Meeting of the Association of Computational Linguistics 2023. Association for Computational Linguistics (ACL), 2023, pp. 9963–9977.
- H. Zhao, W. Du, F. Li, P. Li, and G. Liu, “Fedprompt: Communication-efficient and privacy-preserving prompt tuning in federated learning,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- T. Zhang, L. Gao, C. He, M. Zhang, B. Krishnamachari, and S. Avestimehr, “Federated Learning for Internet of Things: Applications, Challenges, and Opportunities,” arXiv:2111.07494 [cs], Mar. 2022, arXiv: 2111.07494. [Online]. Available: http://arxiv.org/abs/2111.07494
- J. Chen, W. Xu, S. Guo, J. Wang, J. Zhang, and H. Wang, “Fedtune: A deep dive into efficient federated fine-tuning with pre-trained transformers,” arXiv preprint arXiv:2211.08025, 2022.
- N. Ding, Y. Qin, G. Yang, F. Wei, Z. Yang, Y. Su, S. Hu, Y. Chen, C.-M. Chan, W. Chen et al., “Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models,” arXiv preprint arXiv:2203.06904, 2022.
- Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating Gradient Inversion Attacks and Defenses in Federated Learning,” Nov. 2021, arXiv:2112.00059 [cs] version: 1. [Online]. Available: http://arxiv.org/abs/2112.00059
- A. Hatamizadeh, H. Yin, H. Roth, W. Li, J. Kautz, D. Xu, and P. Molchanov, “GradViT: Gradient Inversion of Vision Transformers,” Mar. 2022, arXiv:2203.11894 [cs]. [Online]. Available: http://arxiv.org/abs/2203.11894
- D. I. Dimitrov, M. Baader, M. N. Müller, and M. Vechev, “Spear: Exact gradient inversion of batches in federated learning,” arXiv preprint arXiv:2403.03945, 2024.
- N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131.
- T. Ridnik, E. Ben-Baruch, A. Noy, and L. Zelnik-Manor, “ImageNet-21K Pretraining for the Masses,” Aug. 2021, arXiv:2104.10972 [cs]. [Online]. Available: http://arxiv.org/abs/2104.10972
- G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classification: Benchmark and state of the art,” Proceedings of the IEEE, vol. 105, no. 10, pp. 1865–1883, 2017.
- B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, and M. Welling, “Rotation Equivariant CNNs for Digital Pathology,” Jun. 2018, _eprint: 1806.03962.
- I. Dimitrovski, I. Kitanovski, D. Kocev, and N. Simidjievski, “Current Trends in Deep Learning for Earth Observation:An Open-source Benchmark Arena for Image Classification,” arXiv preprint arXiv:2207.07189, 2022.
- W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, “The kinetics human action video dataset,” 2017.
- K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,” arXiv preprint arXiv:1212.0402, 2012.
- H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” in 2011 International conference on computer vision. IEEE, 2011, pp. 2556–2563.
- W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6479–6488.
- R. Wightman, “PyTorch Image Models,” 2019, publication Title: GitHub repository. [Online]. Available: https://github.com/rwightman/pytorch-image-models
- G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” in ICML, vol. 2, no. 3, 2021, p. 4.
- Z. Tong, Y. Song, J. Wang, and L. Wang, “Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training,” arXiv preprint arXiv:2203.12602, 2022.
- S. Ruder, “An overview of gradient descent optimization algorithms,” Jun. 2017, arXiv:1609.04747 [cs]. [Online]. Available: http://arxiv.org/abs/1609.04747
- C. Feichtenhofer, “X3d: Expanding architectures for efficient video recognition,” 2020.
- Z. Shen, Z. Liu, J. Qin, M. Savvides, and K.-T. Cheng, “Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning,” Feb. 2021, arXiv:2102.03983 [cs]. [Online]. Available: http://arxiv.org/abs/2102.03983
- F. Varno, M. Saghayi, L. Rafiee, S. Gupta, S. Matwin, and M. Havaei, “Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation,” arXiv:2204.13170 [cs], Apr. 2022, arXiv: 2204.13170. [Online]. Available: http://arxiv.org/abs/2204.13170
- M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe, and M. Atiquzzaman, “Local Differential Privacy for Deep Learning,” IEEE Internet of Things Journal, vol. 7, no. 7, pp. 5827–5842, Jul. 2020, arXiv:1908.02997 [cs]. [Online]. Available: http://arxiv.org/abs/1908.02997
- C. Dwork, “Differential Privacy: A Survey of Results,” in Theory and Applications of Models of Computation, ser. Lecture Notes in Computer Science, M. Agrawal, D. Du, Z. Duan, and A. Li, Eds. Berlin, Heidelberg: Springer, 2008, pp. 1–19.
- C. Dwork, A. Roth, and others, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014, publisher: Now Publishers, Inc.
- Y. Wang, Q. Yao, J. Kwok, and L. M. Ni, “Generalizing from a Few Examples: A Survey on Few-Shot Learning,” Apr. 2019. [Online]. Available: https://arxiv.org/abs/1904.05046v3
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
- M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging Properties in Self-Supervised Vision Transformers,” arXiv:2104.14294 [cs], May 2021, arXiv: 2104.14294. [Online]. Available: http://arxiv.org/abs/2104.14294
- X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” Advances in neural information processing systems, vol. 28, 2015.
- I. Turc, M. Chang, K. Lee, and K. Toutanova, “Well-read students learn better: The impact of student initialization on knowledge distillation,” CoRR, vol. abs/1908.08962, 2019. [Online]. Available: http://arxiv.org/abs/1908.08962
- Guangyu Sun (47 papers)
- Umar Khalid (18 papers)
- Matias Mendieta (15 papers)
- Chen Chen (753 papers)
- Pu Wang (83 papers)