FedPFT: Federated Proxy Fine-Tuning of Foundation Models (2404.11536v2)
Abstract: Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise compression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations-layer-level and neuron-level-before and during FL fine-tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theoretical guarantees. Experimental results on seven commonly used datasets (i.e., four text and three vision) demonstrate the superiority of FedPFT.
- Large image datasets: A pyrrhic win for computer vision? In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1536–1546. IEEE, 2021.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Federated large language model: A position paper. arXiv preprint arXiv:2307.08925, 2023.
- Fedbone: Towards large-scale federated multi-task learning. arXiv preprint arXiv:2306.17465, 2023.
- Fedobd: Opportunistic block dropout for efficiently training large-scale neural networks through federated learning. In Edith Elkind, editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 3541–3549. International Joint Conferences on Artificial Intelligence Organization, 8 2023. Main Track.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
- Fate-llm: A industrial grade federated learning framework for large language models. arXiv preprint arXiv:2310.10049, 2023.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR, 2019.
- Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335, 2019.
- Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
- Learning multiple layers of features from tiny images. 2009.
- Federatedscope-llm: A comprehensive package for fine-tuning large language models in federated learning. arXiv preprint arXiv:2309.00363, 2023.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 7765–7773, 2018.
- Mini-model adaptation: Efficiently extending pretrained models to new languages via aligned shallow training. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 5474–5490, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
- Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
- Improving language understanding by generative pre-training.
- Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
- On the effect of dropping layers of pre-trained transformer models. Computer Speech & Language, 77:101429, 2023.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
- Lst: Ladder side-tuning for parameter and memory efficient transfer learning. Advances in Neural Information Processing Systems, 35:12991–13005, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
- Fedbfpt: An efficient federated learning framework for bert further pre-training. In Edith Elkind, editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 4344–4352. International Joint Conferences on Artificial Intelligence Organization, 8 2023. Main Track.
- Flgo: A fully customizable federated learning platform. arXiv preprint arXiv:2306.12079, 2023.
- Learning structured sparsity in deep neural networks. Advances in neural information processing systems, 29, 2016.
- A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426, 2017.
- Offsite-tuning: Transfer learning without full model. arXiv preprint arXiv:2302.04870, 2023.
- Federated fine-tuning of billion-sized language models across mobile devices. arXiv preprint arXiv:2308.13894, 2023.
- Federated foundation models: Privacy-preserving and collaborative learning for large models. arXiv preprint arXiv:2305.11414, 2023.
- FedPETuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, pages 9963–9977, Toronto, Canada, July 2023. Association for Computational Linguistics.
- Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision, pages 19–27, 2015.
- When foundation model meets federated learning: Motivations, challenges, and future directions. arXiv preprint arXiv:2306.15546, 2023.
- Zhaopeng Peng (7 papers)
- Xiaoliang Fan (17 papers)
- Yufan Chen (34 papers)
- Zheng Wang (400 papers)
- Shirui Pan (198 papers)
- Chenglu Wen (30 papers)
- Ruisheng Zhang (5 papers)
- Cheng Wang (386 papers)