Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FedPEAT: Convergence of Federated Learning, Parameter-Efficient Fine Tuning, and Emulator Assisted Tuning for Artificial Intelligence Foundation Models with Mobile Edge Computing (2310.17491v2)

Published 26 Oct 2023 in cs.LG and cs.NI

Abstract: The emergence of foundation models, including language and vision models, has reshaped AI's landscape, offering capabilities across various applications. Deploying and fine-tuning these large models, like GPT-3 and BERT, presents challenges, especially in the current foundation model era. We introduce Emulator-Assisted Tuning (EAT) combined with Parameter-Efficient Fine-Tuning (PEFT) to form Parameter-Efficient Emulator-Assisted Tuning (PEAT). Further, we expand this into federated learning as Federated PEAT (FedPEAT). FedPEAT uses adapters, emulators, and PEFT for federated model tuning, enhancing model privacy and memory efficiency. Adapters adjust pre-trained models, while emulators give a compact representation of original models, addressing both privacy and efficiency. Adaptable to various neural networks, our approach also uses deep reinforcement learning for hyper-parameter optimization. We tested FedPEAT in a unique scenario with a server participating in collaborative federated tuning, showcasing its potential in tackling foundation model challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Brown, T. et al. Language models are few-shot learners. \JournalTitleAdvances in Neural Information Processing Systems 33, 1877–1901 (2020).
  2. Radford, A. et al. Language models are unsupervised multitask learners. \JournalTitleOpenAI blog 1, 9 (2019).
  3. BERT: Pre-training of deep bidirectional transformers for language understanding. \JournalTitlearXiv preprint arXiv:1810.04805 (2018).
  4. Wei, J. et al. Finetuned language models are zero-shot learners. \JournalTitlearXiv preprint arXiv:2109.01652 (2021).
  5. Muennighoff, N. et al. Crosslingual generalization through multitask finetuning. \JournalTitlearXiv preprint arXiv:2211.01786 (2022).
  6. The roadmap to 6g: Ai empowered wireless networks. \JournalTitleIEEE communications magazine 57, 84–90 (2019).
  7. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, 1273–1282 (PMLR, 2017).
  8. Konečnỳ, J. et al. Federated learning: Strategies for improving communication efficiency. \JournalTitlearXiv preprint arXiv:1610.05492 (2016).
  9. Bonawitz, K. et al. Towards federated learning at scale: System design. \JournalTitleProceedings of Machine Learning and Systems 1, 374–388 (2019).
  10. Learning multiple visual domains with residual adapters. \JournalTitleAdvances in Neural Information Processing Systems 30 (2017).
  11. Towards a unified view of parameter-efficient transfer learning. \JournalTitlearXiv preprint arXiv:2110.04366 (2021).
  12. Houlsby, N. et al. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790–2799 (PMLR, 2019).
  13. Hu, E. J. et al. LoRA: Low-rank adaptation of large language models. \JournalTitlearXiv preprint arXiv:2106.09685 (2021).
  14. Learning how to ask: Querying LMs with mixtures of soft prompts. \JournalTitlearXiv preprint arXiv:2104.06599 (2021).
  15. The power of scale for parameter-efficient prompt tuning. \JournalTitlearXiv preprint arXiv:2104.08691 (2021).
  16. Prefix-tuning: Optimizing continuous prompts for generation. \JournalTitlearXiv preprint arXiv:2101.00190 (2021).
  17. Liu, X. et al. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. \JournalTitlearXiv preprint arXiv:2110.07602 (2021).
  18. An, S. et al. Input-tuning: Adapting unfamiliar inputs to frozen pretrained models. \JournalTitlearXiv preprint arXiv:2203.03131 (2022).
  19. Liu, H. et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. \JournalTitleAdvances in Neural Information Processing Systems 35, 1950–1965 (2022).
  20. Zhang, Z. et al. FedPETuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. In Annual Meeting of the Association of Computational Linguistics 2023, 9963–9977 (Association for Computational Linguistics (ACL), 2023).
  21. FedPrompt: Communication-efficient and privacy-preserving prompt tuning in federated learning. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5 (IEEE, 2023).
  22. Zhang, J. et al. Towards building the federated gpt: Federated instruction tuning. \JournalTitlearXiv preprint arXiv:2305.05644 (2023).
  23. PromptFL: Let federated participants cooperatively learn prompts instead of models-federated learning in age of foundation model. \JournalTitleIEEE Transactions on Mobile Computing (2023).
  24. FedAdapter: Efficient federated learning for modern NLP. In ACM 29th Annual International Conference on Mobile Computing and Networking (MobiCom) (2023).
  25. Smith, S. et al. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model. \JournalTitlearXiv preprint arXiv:2201.11990 (2022).
  26. Xiao, G. et al. SmoothQuant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning, 38087–38099 (PMLR, 2023).
  27. Offsite-tuning: Transfer learning without full model. \JournalTitlearXiv preprint arXiv:2302.04870 (2023).
  28. Ding, Y. et al. DC-CCL: Device-cloud collaborative controlled learning for large vision models. \JournalTitlearXiv preprint arXiv:2303.10361 (2023).
  29. Kuang, W. et al. FederatedScope-LLM: A comprehensive package for fine-tuning large language models in federated learning. \JournalTitlearXiv preprint arXiv:2309.00363 (2023).
  30. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. \JournalTitlearXiv preprint arXiv:1510.00149 (2015).
  31. On the effect of dropping layers of pre-trained transformer models. \JournalTitleComputer Speech & Language 77, 101429 (2023).
  32. Distilling the knowledge in a neural network. \JournalTitlearXiv preprint arXiv:1503.02531 (2015).
  33. Accelerating training of transformer-based language models with progressive layer dropping. \JournalTitleAdvances in Neural Information Processing Systems 33, 14011–14023 (2020).
  34. Zhang, Z. et al. FedPETuning: When federated learning meets the parameter-efficient tuning methods of pre-trained language models. In Findings of the Association for Computational Linguistics: ACL 2023, 9963–9977, DOI: 10.18653/v1/2023.findings-acl.632 (Association for Computational Linguistics, Toronto, Canada, 2023).
  35. Zhang, J. et al. Towards building the federated GPT: Federated instruction tuning. \JournalTitlearXiv preprint arXiv:2305.05644 (2023).
  36. Lim, W. Y. B. et al. Dynamic edge association and resource allocation in self-organizing hierarchical federated learning networks. \JournalTitleIEEE Journal on Selected Areas in Communications 39, 3640–3653 (2021).
  37. Statistical simulation models for rayleigh and rician fading. In IEEE International Conference on Communications, 2003. ICC’03., vol. 5, 3524–3529 (IEEE, 2003).
  38. Proximal policy optimization algorithms. \JournalTitlearXiv preprint arXiv:1707.06347 (2017).
  39. Estimation of particle transmission by random sampling. \JournalTitleNational Bureau of Standards Applied Mathematics Series 12, 27–30 (1951).
  40. Action branching architectures for deep reinforcement learning. In Proceedings of the aaai conference on artificial intelligence, vol. 32 (2018).
  41. High-dimensional continuous control using generalized advantage estimation. \JournalTitlearXiv preprint arXiv:1506.02438 (2015).
  42. Erceg, V. et al. An empirically based path loss model for wireless channels in suburban environments. \JournalTitleIEEE Journal on selected areas in communications 17, 1205–1211 (1999).
  43. Adam: A method for stochastic optimization. \JournalTitlearXiv preprint arXiv:1412.6980 (2014).
  44. Lowe, R. et al. Multi-agent actor-critic for mixed cooperative-competitive environments. \JournalTitleAdvances in neural information processing systems 30 (2017).
Citations (3)

Summary

We haven't generated a summary for this paper yet.