Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models (2401.06432v2)
Abstract: Foundation models (FMs) adapt well to specific domains or tasks with fine-tuning, and federated learning (FL) enables the potential for privacy-preserving fine-tuning of the FMs with on-device local data. For federated fine-tuning of FMs, we consider the FMs with small to medium parameter sizes of single digit billion at maximum, referred to as on-device FMs (ODFMs) that can be deployed on devices for inference but can only be fine-tuned with parameter efficient methods. In our work, we tackle the data and system heterogeneity problem of federated fine-tuning of ODFMs by proposing a novel method using heterogeneous low-rank approximations (LoRAs), namely HetLoRA. First, we show that the naive approach of using homogeneous LoRA ranks across devices face a trade-off between overfitting and slow convergence, and thus propose HetLoRA, which allows heterogeneous ranks across client devices and efficiently aggregates and distributes these heterogeneous LoRA modules. By applying rank self-pruning locally and sparsity-weighted aggregation at the server, HetLoRA combines the advantages of high and low-rank LoRAs, which achieves improved convergence speed and final performance compared to homogeneous LoRA. Furthermore, HetLoRA offers enhanced computation efficiency compared to full fine-tuning, making it suitable for federated fine-tuning across heterogeneous devices.
- Slora: Federated parameter efficient fine-tuning of language models. CoRR, abs/2308.06522.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
- Practical secure aggregation for federated learning on user-held data. In NIPS Workshop on Private Multi-Party Machine Learning.
- Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
- Differentially private bias-term only fine-tuning of foundation models. arXiv preprint arXiv:2210.00036.
- Fedtune: A deep dive into efficient federated fine-tuning with pre-trained transformers. CoRR, abs/2211.08025.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378.
- Google. 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Google. 2023. Palm 2 technical report. arXiv preprint arXiv:2305.1040.
- Pfedprompt: Learning personalized prompt for vision-language models in federated learning. In Proceedings of the ACM Web Conference 2023, WWW ’23, page 1364–1374, New York, NY, USA. Association for Computing Machinery.
- Promptfl: Let federated participants cooperatively learn prompts instead of models — federated learning in age of foundation model. CoRR, abs/2208.11625.
- Venkatesan Guruswami and Ravi Kannan. 2012. Lecture notes in computer science theory for the information age.
- Parameter-efficient transfer learning for nlp. In Proceedings of the International Conference on Machine Learning (ICML).
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR).
- Large language models are zero-shot reasoners. In The 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
- What would elsa do? freezing layers during transformer fine-tuning. arXiv preprint arXiv:1911.03090.
- Guiding the last layer in federated learning with pre-trained models. In Workshop of Federated Learning and Analytics in Practice@ICML.
- The power of scale for parameter-efficient prompt tuning. In Empirical Methods in Natural Language Processing (EMNLP).
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Federated multilingual models for medical transcript analysis. In Conference on Health, Inference, and Learning (CHIL), pages 147–162.
- Communication-Efficient Learning of Deep Networks from Decentralized Data. International Conference on Artificial Intelligenece and Statistics (AISTATS).
- OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:submit/4812508.
- Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020.
- Federated optimization for heterogeneous networks. In Proceedings of the 3rd MLSys Conference.
- Chateval: A tool for chatbot evaluation. Proceedings of NAACL-HLT.
- Ofir Ben Shoham and Nadav Rappoport. 2023. Federated learning of medical concepts embedding using behrt. arXiv preprint arXiv:2305.13052.
- Fit: Parameter efficient few-shot transfer learning for personalized and federated image classification. International Conference on Learning Representations (ICLR).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- TL;DR: Mining Reddit to learn automatic summarization. In Proceedings of the Workshop on New Frontiers in Summarization, pages 59–63, Copenhagen, Denmark. Association for Computational Linguistics.
- A field guide to federated optimization. arXiv preprint arXiv:2107.06917.
- Adaptive federated learning in resource constrained edge computing systems. IEEE Journal on Selected Areas in Communications, 37(6):1205 – 1221.
- Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning. Advances in Neural Information Processing Systems, 34:16158–16170.
- lo-fi: distributed fine-tuning without communication. Transactions on Machine Learning Research (TMLR).
- Beyond goldfish memory: Long-term open-domain conversation. arXiv preprint arXiv:2107.07567.
- Fedlora: Model-heterogeneous personalized federated learning with lora tuning. arXiv preprint arXiv:2310.13283.
- Federated foundation models: Privacy-preserving and collaborative learning for large models. arXiv preprint arXiv:2305.11414.
- Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199.
- Towards building the federated gpt: Federated instruction tuning. CoRR, abs/2305.05644.
- Adaptive budget allocation for parameter-efficient fine-tuning. In The 11th International Conference on Learning Representations (ICLR).
- Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL).
- Fedlegal: The first real-world federated learning benchmark for legal nlp. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL).
- Fedpetuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. Findings of the Association for Computational Linguistics (ACL).
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419.
- Yae Jee Cho (15 papers)
- Luyang Liu (20 papers)
- Zheng Xu (73 papers)
- Aldi Fahrezi (2 papers)
- Gauri Joshi (73 papers)