Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Parameter-Efficient Fine-Tuning to Enable Foundation Models in Federated Learning (2210.01708v5)

Published 4 Oct 2022 in cs.LG and cs.CV

Abstract: Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices. In the typical FL paradigm (e.g., FedAvg), model weights are sent to and from the server each round to participating clients. Recently, the use of small pre-trained models has been shown to be effective in federated learning optimization and improving convergence. However, recent state-of-the-art pre-trained models are getting more capable but also have more parameters, known as the "Foundation Models." In conventional FL, sharing the enormous model weights can quickly put a massive communication burden on the system, especially if more capable models are employed. Can we find a solution to enable those strong and readily available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden? To this end, we investigate the use of parameter-efficient fine-tuning in federated learning and thus introduce a new framework: FedPEFT. Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings. By only locally tuning and globally sharing a small portion of the model weights, significant reductions in the total communication overhead can be achieved while maintaining competitive or even better performance in a wide range of federated learning scenarios, providing insight into a new paradigm for practical and effective federated systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.   PMLR, 2017, pp. 1273–1282.
  2. T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated Optimization in Heterogeneous Networks,” Apr. 2020, arXiv:1812.06127 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1812.06127
  3. S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” in International Conference on Machine Learning.   PMLR, 2020, pp. 5132–5143.
  4. M. Mendieta, T. Yang, P. Wang, M. Lee, Z. Ding, and C. Chen, “Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning,” arXiv:2111.14213 [cs], Mar. 2022, arXiv: 2111.14213. [Online]. Available: http://arxiv.org/abs/2111.14213
  5. D. A. E. Acar, Y. Zhao, R. M. Navarro, M. Mattina, P. N. Whatmough, and V. Saligrama, “Federated learning based on dynamic regularization,” arXiv preprint arXiv:2111.04263, 2021.
  6. J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” Advances in neural information processing systems, vol. 33, pp. 7611–7623, 2020.
  7. J. Nguyen, K. Malik, M. Sanjabi, and M. Rabbat, “Where to Begin? Exploring the Impact of Pre-Training and Initialization in Federated Learning,” Jun. 2022, arXiv:2206.15387 [cs]. [Online]. Available: http://arxiv.org/abs/2206.15387
  8. H.-Y. Chen, C.-H. Tu, Z. Li, H.-W. Shen, and W.-L. Chao, “On Pre-Training for Federated Learning,” Jun. 2022, arXiv:2206.11488 [cs]. [Online]. Available: http://arxiv.org/abs/2206.11488
  9. P. Kairouz et al., “Advances and Open Problems in Federated Learning,” arXiv:1912.04977 [cs, stat], Mar. 2021, arXiv: 1912.04977. [Online]. Available: http://arxiv.org/abs/1912.04977
  10. H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” Feb. 2017, arXiv:1602.05629 [cs]. [Online]. Available: http://arxiv.org/abs/1602.05629
  11. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 2021, arXiv:2010.11929 [cs]. [Online]. Available: http://arxiv.org/abs/2010.11929
  12. OpenAI, “Gpt-4 technical report,” 2023.
  13. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  14. A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” University of Toronto, 2012.
  15. G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” in 2017 international joint conference on neural networks (IJCNN).   IEEE, 2017, pp. 2921–2926.
  16. R. Bommasani et al., “On the Opportunities and Risks of Foundation Models,” Jul. 2022, arXiv:2108.07258 [cs]. [Online]. Available: http://arxiv.org/abs/2108.07258
  17. H. Cai, C. Gan, L. Zhu, and S. Han, “TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning,” in Advances in Neural Information Processing Systems, vol. 33.   Curran Associates, Inc., 2020, pp. 11 285–11 297. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/81f7acabd411274fcf65ce2070ed568a-Abstract.html
  18. M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual Prompt Tuning,” Jul. 2022, arXiv:2203.12119 [cs]. [Online]. Available: http://arxiv.org/abs/2203.12119
  19. J. Pfeiffer, A. Rücklé, C. Poth, A. Kamath, I. Vulić, S. Ruder, K. Cho, and I. Gurevych, “AdapterHub: A Framework for Adapting Transformers,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.   Online: Association for Computational Linguistics, 2020, pp. 46–54. [Online]. Available: https://aclanthology.org/2020.emnlp-demos.7
  20. H. Chen, R. Tao, H. Zhang, Y. Wang, W. Ye, J. Wang, G. Hu, and M. Savvides, “Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets,” Aug. 2022, arXiv:2208.07463 [cs]. [Online]. Available: http://arxiv.org/abs/2208.07463
  21. Q. Li, B. He, and D. Song, “Model-Contrastive Federated Learning,” Mar. 2021, arXiv:2103.16257 [cs]. [Online]. Available: http://arxiv.org/abs/2103.16257
  22. J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization,” Jul. 2020, arXiv:2007.07481 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2007.07481
  23. M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, T. N. Hoang, and Y. Khazaeni, “Bayesian Nonparametric Federated Learning of Neural Networks,” May 2019, arXiv:1905.12022 [cs, stat]. [Online]. Available: http://arxiv.org/abs/1905.12022
  24. J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated Learning: Strategies for Improving Communication Efficiency,” Oct. 2017, arXiv:1610.05492 [cs]. [Online]. Available: http://arxiv.org/abs/1610.05492
  25. A. T. Suresh, F. X. Yu, S. Kumar, and H. B. McMahan, “Distributed Mean Estimation with Limited Communication,” Sep. 2017, arXiv:1611.00429 [cs]. [Online]. Available: http://arxiv.org/abs/1611.00429
  26. J. Hamer, M. Mohri, and A. T. Suresh, “FedBoost: A Communication-Efficient Algorithm for Federated Learning,” in Proceedings of the 37th International Conference on Machine Learning.   PMLR, Nov. 2020, pp. 3973–3983, iSSN: 2640-3498. [Online]. Available: https://proceedings.mlr.press/v119/hamer20a.html
  27. J. Pan, Z. Lin, X. Zhu, J. Shao, and H. Li, “Parameter-Efficient Image-to-Video Transfer Learning,” Jun. 2022, arXiv:2206.13559 [cs]. [Online]. Available: http://arxiv.org/abs/2206.13559
  28. H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. Raffel, “Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning,” Aug. 2022, arXiv:2205.05638 [cs]. [Online]. Available: http://arxiv.org/abs/2205.05638
  29. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” 2021.
  30. X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks,” Mar. 2022, arXiv:2110.07602 [cs]. [Online]. Available: http://arxiv.org/abs/2110.07602
  31. E. B. Zaken, S. Ravfogel, and Y. Goldberg, “Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models,” arXiv preprint arXiv:2106.10199, 2021.
  32. J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, “Towards a unified view of parameter-efficient transfer learning,” arXiv preprint arXiv:2110.04366, 2021.
  33. J. Pfeiffer, A. Kamath, A. Rücklé, K. Cho, and I. Gurevych, “AdapterFusion: Non-Destructive Task Composition for Transfer Learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume.   Online: Association for Computational Linguistics, 2021, pp. 487–503. [Online]. Available: https://aclanthology.org/2021.eacl-main.39
  34. X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” 2021.
  35. H. Bahng, A. Jahanian, S. Sankaranarayanan, and P. Isola, “Exploring Visual Prompts for Adapting Large-Scale Models,” Jun. 2022, arXiv:2203.17274 [cs]. [Online]. Available: http://arxiv.org/abs/2203.17274
  36. T. Yang, Y. Zhu, Y. Xie, A. Zhang, C. Chen, and M. Li, “Aim: Adapting image models for efficient video action recognition,” arXiv preprint arXiv:2302.03024, 2023.
  37. Y. Yao, A. Zhang, Z. Zhang, Z. Liu, T.-S. Chua, and M. Sun, “CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models,” May 2022, arXiv:2109.11797 [cs]. [Online]. Available: http://arxiv.org/abs/2109.11797
  38. S. Chen, C. Ge, Z. Tong, J. Wang, Y. Song, J. Wang, and P. Luo, “Adaptformer: Adapting vision transformers for scalable visual recognition,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 664–16 678, 2022.
  39. S. Jie and Z.-H. Deng, “Convolutional bypasses are better vision transformer adapters,” arXiv preprint arXiv:2207.07039, 2022.
  40. J. Pan, Z. Lin, X. Zhu, J. Shao, and H. Li, “St-adapter: Parameter-efficient image-to-video transfer learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 26 462–26 477, 2022.
  41. Q. Gao, C. Zhao, Y. Sun, T. Xi, G. Zhang, B. Ghanem, and J. Zhang, “A unified continual learning framework with general parameter-efficient tuning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 483–11 493.
  42. Z. Zhang, Y. Yang, Y. Dai, Q. Wang, Y. Yu, L. Qu, and Z. Xu, “Fedpetuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models,” in Annual Meeting of the Association of Computational Linguistics 2023.   Association for Computational Linguistics (ACL), 2023, pp. 9963–9977.
  43. H. Zhao, W. Du, F. Li, P. Li, and G. Liu, “Fedprompt: Communication-efficient and privacy-preserving prompt tuning in federated learning,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2023, pp. 1–5.
  44. T. Zhang, L. Gao, C. He, M. Zhang, B. Krishnamachari, and S. Avestimehr, “Federated Learning for Internet of Things: Applications, Challenges, and Opportunities,” arXiv:2111.07494 [cs], Mar. 2022, arXiv: 2111.07494. [Online]. Available: http://arxiv.org/abs/2111.07494
  45. J. Chen, W. Xu, S. Guo, J. Wang, J. Zhang, and H. Wang, “Fedtune: A deep dive into efficient federated fine-tuning with pre-trained transformers,” arXiv preprint arXiv:2211.08025, 2022.
  46. N. Ding, Y. Qin, G. Yang, F. Wei, Z. Yang, Y. Su, S. Hu, Y. Chen, C.-M. Chan, W. Chen et al., “Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models,” arXiv preprint arXiv:2203.06904, 2022.
  47. Y. Huang, S. Gupta, Z. Song, K. Li, and S. Arora, “Evaluating Gradient Inversion Attacks and Defenses in Federated Learning,” Nov. 2021, arXiv:2112.00059 [cs] version: 1. [Online]. Available: http://arxiv.org/abs/2112.00059
  48. A. Hatamizadeh, H. Yin, H. Roth, W. Li, J. Kautz, D. Xu, and P. Molchanov, “GradViT: Gradient Inversion of Vision Transformers,” Mar. 2022, arXiv:2203.11894 [cs]. [Online]. Available: http://arxiv.org/abs/2203.11894
  49. D. I. Dimitrov, M. Baader, M. N. Müller, and M. Vechev, “Spear: Exact gradient inversion of batches in federated learning,” arXiv preprint arXiv:2403.03945, 2024.
  50. N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient cnn architecture design,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 116–131.
  51. T. Ridnik, E. Ben-Baruch, A. Noy, and L. Zelnik-Manor, “ImageNet-21K Pretraining for the Masses,” Aug. 2021, arXiv:2104.10972 [cs]. [Online]. Available: http://arxiv.org/abs/2104.10972
  52. G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classification: Benchmark and state of the art,” Proceedings of the IEEE, vol. 105, no. 10, pp. 1865–1883, 2017.
  53. B. S. Veeling, J. Linmans, J. Winkens, T. Cohen, and M. Welling, “Rotation Equivariant CNNs for Digital Pathology,” Jun. 2018, _eprint: 1806.03962.
  54. I. Dimitrovski, I. Kitanovski, D. Kocev, and N. Simidjievski, “Current Trends in Deep Learning for Earth Observation:An Open-source Benchmark Arena for Image Classification,” arXiv preprint arXiv:2207.07189, 2022.
  55. W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijayanarasimhan, F. Viola, T. Green, T. Back, P. Natsev, M. Suleyman, and A. Zisserman, “The kinetics human action video dataset,” 2017.
  56. K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 human actions classes from videos in the wild,” arXiv preprint arXiv:1212.0402, 2012.
  57. H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb: a large video database for human motion recognition,” in 2011 International conference on computer vision.   IEEE, 2011, pp. 2556–2563.
  58. W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6479–6488.
  59. R. Wightman, “PyTorch Image Models,” 2019, publication Title: GitHub repository. [Online]. Available: https://github.com/rwightman/pytorch-image-models
  60. G. Bertasius, H. Wang, and L. Torresani, “Is space-time attention all you need for video understanding?” in ICML, vol. 2, no. 3, 2021, p. 4.
  61. Z. Tong, Y. Song, J. Wang, and L. Wang, “Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training,” arXiv preprint arXiv:2203.12602, 2022.
  62. S. Ruder, “An overview of gradient descent optimization algorithms,” Jun. 2017, arXiv:1609.04747 [cs]. [Online]. Available: http://arxiv.org/abs/1609.04747
  63. C. Feichtenhofer, “X3d: Expanding architectures for efficient video recognition,” 2020.
  64. Z. Shen, Z. Liu, J. Qin, M. Savvides, and K.-T. Cheng, “Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot Learning,” Feb. 2021, arXiv:2102.03983 [cs]. [Online]. Available: http://arxiv.org/abs/2102.03983
  65. F. Varno, M. Saghayi, L. Rafiee, S. Gupta, S. Matwin, and M. Havaei, “Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation,” arXiv:2204.13170 [cs], Apr. 2022, arXiv: 2204.13170. [Online]. Available: http://arxiv.org/abs/2204.13170
  66. M. A. P. Chamikara, P. Bertok, I. Khalil, D. Liu, S. Camtepe, and M. Atiquzzaman, “Local Differential Privacy for Deep Learning,” IEEE Internet of Things Journal, vol. 7, no. 7, pp. 5827–5842, Jul. 2020, arXiv:1908.02997 [cs]. [Online]. Available: http://arxiv.org/abs/1908.02997
  67. C. Dwork, “Differential Privacy: A Survey of Results,” in Theory and Applications of Models of Computation, ser. Lecture Notes in Computer Science, M. Agrawal, D. Du, Z. Duan, and A. Li, Eds.   Berlin, Heidelberg: Springer, 2008, pp. 1–19.
  68. C. Dwork, A. Roth, and others, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014, publisher: Now Publishers, Inc.
  69. Y. Wang, Q. Yao, J. Kwok, and L. M. Ni, “Generalizing from a Few Examples: A Survey on Few-Shot Learning,” Apr. 2019. [Online]. Available: https://arxiv.org/abs/1904.05046v3
  70. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
  71. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging Properties in Self-Supervised Vision Transformers,” arXiv:2104.14294 [cs], May 2021, arXiv: 2104.14294. [Online]. Available: http://arxiv.org/abs/2104.14294
  72. X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” Advances in neural information processing systems, vol. 28, 2015.
  73. I. Turc, M. Chang, K. Lee, and K. Toutanova, “Well-read students learn better: The impact of student initialization on knowledge distillation,” CoRR, vol. abs/1908.08962, 2019. [Online]. Available: http://arxiv.org/abs/1908.08962
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Guangyu Sun (47 papers)
  2. Umar Khalid (18 papers)
  3. Matias Mendieta (15 papers)
  4. Chen Chen (752 papers)
  5. Pu Wang (83 papers)
Citations (14)