Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedPFT: Federated Proxy Fine-Tuning of Foundation Models (2404.11536v2)

Published 17 Apr 2024 in cs.LG and cs.AI

Abstract: Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise compression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations-layer-level and neuron-level-before and during FL fine-tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theoretical guarantees. Experimental results on seven commonly used datasets (i.e., four text and three vision) demonstrate the superiority of FedPFT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1536–1546. IEEE, 2021.
  2. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  3. Federated large language model: A position paper. arXiv preprint arXiv:2307.08925, 2023.
  4. Fedbone: Towards large-scale federated multi-task learning. arXiv preprint arXiv:2306.17465, 2023.
  5. Fedobd: Opportunistic block dropout for efficiently training large-scale neural networks through federated learning. In Edith Elkind, editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 3541–3549. International Joint Conferences on Artificial Intelligence Organization, 8 2023. Main Track.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  7. Fate-llm: A industrial grade federated learning framework for large language models. arXiv preprint arXiv:2310.10049, 2023.
  8. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  9. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
  10. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
  11. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
  12. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
  13. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
  14. Learning multiple layers of features from tiny images. 2009.
  15. Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning. arXiv preprint arXiv:2309.00363, 2023.
  16. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
  17. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  18. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 7765–7773, 2018.
  19. Mini-model adaptation: Efficiently extending pretrained models to new languages via aligned shallow training. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 5474–5490, Toronto, Canada, July 2023. Association for Computational Linguistics.
  20. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  21. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  22. Improving language understanding by generative pre-training.
  23. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  24. On the effect of dropping layers of pre-trained transformer models. Computer Speech & Language, 77:101429, 2023.
  25. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
  26. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems, 35:12991–13005, 2022.
  27. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  28. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  29. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
  30. Fedbfpt: An efficient federated learning framework for bert further pre-training. In Edith Elkind, editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 4344–4352. International Joint Conferences on Artificial Intelligence Organization, 8 2023. Main Track.
  31. Flgo: A fully customizable federated learning platform. arXiv preprint arXiv:2306.12079, 2023.
  32. Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016.
  33. A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426, 2017.
  34. Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870, 2023.
  35. Federated fine-tuning of billion-sized language models across mobile devices. arXiv preprint arXiv:2308.13894, 2023.
  36. Federated foundation models: Privacy-preserving and collaborative learning for large models. arXiv preprint arXiv:2305.11414, 2023.
  37. FedPETuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 9963–9977, Toronto, Canada, July 2023. Association for Computational Linguistics.
  38. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision, pages 19–27, 2015.
  39. When foundation model meets federated learning: Motivations, challenges, and future directions. arXiv preprint arXiv:2306.15546, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhaopeng Peng (7 papers)
  2. Xiaoliang Fan (17 papers)
  3. Yufan Chen (34 papers)
  4. Zheng Wang (400 papers)
  5. Shirui Pan (198 papers)
  6. Chenglu Wen (30 papers)
  7. Ruisheng Zhang (5 papers)
  8. Cheng Wang (386 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.