Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (2312.06353v5)

Published 11 Dec 2023 in cs.LG and cs.DC

Abstract: Pre-trained LLMs need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.

PDF HTML Abstract

Federated Full-Parameter Tuning of Billion-Sized LLMs with Communication Cost under 18 Kilobytes

This paper addresses a significant challenge in the federated learning (FL) of LLMs: enabling full-parameter tuning of billion-sized LLMs on decentralized devices without incurring prohibitive communication costs. The authors introduce a novel method, FedKSeed, which strategically minimizes the communication overhead during the FL process to less than 18 kilobytes per round.

Problem Context

LLMs require fine-tuning to improve performance on specific tasks. However, traditional federated full-parameter tuning of LLMs can be computationally expensive and require considerable data transmission, typically on the order of gigabytes, making it infeasible for devices with limited bandwidth and storage capabilities. Most existing works focus on parameter-efficient fine-tuning (PEFT) methods, which, although reducing some overhead, often cannot match the performance of full-parameter adjustments.

Research Contributions

The core of the proposal, FedKSeed, utilizes zeroth-order optimization (ZOO) with a predefined set of random seeds, significantly reducing the need to transmit full model parameters. Each client's update only requires transmitting a seed and a scalar gradient, which can be encoded very efficiently. This reduction of data transmission represents a dramatic improvement over traditional methods that scale with the model size.

A notable innovation is the concept of zonal perturbation, where the perturbation introduced during model updates is selected from a fixed pool of random seeds (denoted as $K$ seeds). This concept theoretically ensures that full-parameter tuning in FL can achieve communication efficiency without losing the benefits of direct parameter tuning.

Moreover, the paper introduces a variant of FedKSeed called FedKSeed-Pro, which optimizes the selection of these seeds by assigning them non-uniform probabilities. This adjustment is based on the estimated importance of various perturbations, intending to improve both computation efficiency and model accuracy.

Experimental Setup and Results

The authors conduct experiments across six different scenarios incorporating models like DataJuicer-1.3B and LLaMA-3B, various datasets, and data partitions. The results are profound: FedKSeed not only outperforms other federated fine-tuning approaches in terms of communication efficiency but also achieves better zero-shot generalization. For instance, FedKSeed-Pro demonstrates an average 7.26% improvement in Rouge-L scores over the best alternatives.

Table 1 in the paper provides a qualitative comparison indicating that FedKSeed outperforms other methods in terms of both memory and communication costs, establishing its practicality for federated LLM tuning on devices.

Theoretical Insights

The paper builds on existing convergence theories of ZOO, adapting them to the federated context. By effectively managing noise in gradient estimation through a fixed set of seeds, it maintains convergence properties while drastically reducing communication demands.

The authors present two principles guiding the choice of $K$ : not too few to avoid insufficient model training and not too many to prevent excessive computation and tuning dilution. This ensures a balance between computational efficiency and model fidelity.

Implications and Future Work

FedKSeed and its enhanced version represent significant advancements in making federated full-parameter tuning of LLMs feasible on edge devices. This work opens the door to more equitable and extensive use of large models in decentralized settings, potentially democratizing access to advanced LLM capabilities.

Looking forward, the approach may encourage future research into decentralized FL architectures that can leverage the reduced communication requirements, as well as further explorations of ZOO-based methods in other model learning paradigms. Additionally, optimizing seed selection strategies and exploring other ZOO variants could extend these results to an even broader range of applications and model types.

PDF Markdown Bookmark Chat (Pro)

References (48)

Authors (6)

Zhen Qin (105 papers)
Daoyuan Chen (32 papers)
Bingchen Qian (13 papers)
Bolin Ding (112 papers)
Yaliang Li (117 papers)
Shuiguang Deng (45 papers)

Citations (18)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - zhenqincn/FedKSeed (10 stars)

Tweets

https://twitter.com/PandaAshwinee/status/1871732465204969761