Federated Full-Parameter Tuning of Billion-Sized LLMs with Communication Cost under 18 Kilobytes
This paper addresses a significant challenge in the federated learning (FL) of LLMs: enabling full-parameter tuning of billion-sized LLMs on decentralized devices without incurring prohibitive communication costs. The authors introduce a novel method, FedKSeed, which strategically minimizes the communication overhead during the FL process to less than 18 kilobytes per round.
Problem Context
LLMs require fine-tuning to improve performance on specific tasks. However, traditional federated full-parameter tuning of LLMs can be computationally expensive and require considerable data transmission, typically on the order of gigabytes, making it infeasible for devices with limited bandwidth and storage capabilities. Most existing works focus on parameter-efficient fine-tuning (PEFT) methods, which, although reducing some overhead, often cannot match the performance of full-parameter adjustments.
Research Contributions
The core of the proposal, FedKSeed, utilizes zeroth-order optimization (ZOO) with a predefined set of random seeds, significantly reducing the need to transmit full model parameters. Each client's update only requires transmitting a seed and a scalar gradient, which can be encoded very efficiently. This reduction of data transmission represents a dramatic improvement over traditional methods that scale with the model size.
A notable innovation is the concept of zonal perturbation, where the perturbation introduced during model updates is selected from a fixed pool of random seeds (denoted as seeds). This concept theoretically ensures that full-parameter tuning in FL can achieve communication efficiency without losing the benefits of direct parameter tuning.
Moreover, the paper introduces a variant of FedKSeed called FedKSeed-Pro, which optimizes the selection of these seeds by assigning them non-uniform probabilities. This adjustment is based on the estimated importance of various perturbations, intending to improve both computation efficiency and model accuracy.
Experimental Setup and Results
The authors conduct experiments across six different scenarios incorporating models like DataJuicer-1.3B and LLaMA-3B, various datasets, and data partitions. The results are profound: FedKSeed not only outperforms other federated fine-tuning approaches in terms of communication efficiency but also achieves better zero-shot generalization. For instance, FedKSeed-Pro demonstrates an average 7.26% improvement in Rouge-L scores over the best alternatives.
Table 1 in the paper provides a qualitative comparison indicating that FedKSeed outperforms other methods in terms of both memory and communication costs, establishing its practicality for federated LLM tuning on devices.
Theoretical Insights
The paper builds on existing convergence theories of ZOO, adapting them to the federated context. By effectively managing noise in gradient estimation through a fixed set of seeds, it maintains convergence properties while drastically reducing communication demands.
The authors present two principles guiding the choice of : not too few to avoid insufficient model training and not too many to prevent excessive computation and tuning dilution. This ensures a balance between computational efficiency and model fidelity.
Implications and Future Work
FedKSeed and its enhanced version represent significant advancements in making federated full-parameter tuning of LLMs feasible on edge devices. This work opens the door to more equitable and extensive use of large models in decentralized settings, potentially democratizing access to advanced LLM capabilities.
Looking forward, the approach may encourage future research into decentralized FL architectures that can leverage the reduced communication requirements, as well as further explorations of ZOO-based methods in other model learning paradigms. Additionally, optimizing seed selection strategies and exploring other ZOO variants could extend these results to an even broader range of applications and model types.