Papers
Topics
Authors
Recent
Search
2000 character limit reached

Federated Fine-tuning

Updated 28 May 2026
  • Federated fine-tuning adapts pre-trained models using decentralized data, enhancing privacy and eliminating centralization.
  • Methodologies involve single or multi-round aggregation protocols, making large models scalable with minimal communication.
  • Recent advances show models exceeding 1B parameters retain accuracy with one communication round, optimizing efficiency.

Federated fine-tuning is a distributed machine learning paradigm designed to adapt large pre-trained neural network models—such as foundation models (FMs), LLMs, and multimodal transformers—to new tasks or domains using private, decentralized datasets. This approach leverages federated learning (FL) protocols to coordinate multiple clients (devices, organizations, or silos), enabling model adaptation without the need to centralize locally held data. Federated fine-tuning is technically challenging due to the resource demands of state-of-the-art models, the statistical heterogeneity of client data, and strict communication or privacy constraints. It encompasses both full-parameter fine-tuning and modern parameter-efficient methods (adapters, LoRA, tensorization), and supports a variety of modalities, architectures, and optimization schemes. Recent work demonstrates that, for sufficiently large foundation models, even a single communication round can suffice to match classic multi-round federated aggregation in accuracy, drastically improving scalability and accessibility (Wang et al., 2024). The following sections survey the mathematical foundations, algorithmic developments, efficiency techniques, practical considerations, and open challenges in federated fine-tuning.

1. Mathematical Formulation and Problem Setup

Federated fine-tuning aims to adapt a pre-trained model w(0)Rdw^{(0)} \in \mathbb{R}^d across KK clients, each holding a private dataset DkD_k of size nkn_k. The canonical objective is to minimize the weighted sum of local empirical losses: minwRdF(w)k=1KpkLk(w),pk=nk/n,\min_{w \in \mathbb{R}^d} F(w) \equiv \sum_{k=1}^K p_k L_k(w)\,, \quad p_k = n_k / n\,, where

Lk(w)=1nk(x,y)Dk(w;x,y)L_k(w) = \frac{1}{n_k} \sum_{(x, y) \in D_k} \ell(w; x, y)

and \ell is a per-example loss such as cross-entropy (Wang et al., 2024). Assumptions commonly used include LL-Lipschitz smoothness and that the update norm during fine-tuning is small relative to the pre-trained initialization: Lk(w)Lk(w)Lww;ww(0)τw(0),  τ<1.\|\nabla L_k(w) - \nabla L_k(w')\| \leq L \|w - w'\|\,; \quad \|w - w^{(0)}\| \leq \tau \|w^{(0)}\|,\; \tau < 1\,. In multi-modal contexts, the loss may further mix distinct modalities and tasks, leading to more complex objective decompositions (Zheng et al., 11 Jun 2025).

Clients train models locally, either updating all parameters or using parameter-efficient modules. Communication with a central server aggregates updates, usually by weighted averaging. This process repeats for TT rounds, unless optimized for one-shot settings. Variants include hierarchy-based aggregation (Liu et al., 27 Mar 2025), clustered or asynchronous aggregation (Ni et al., 27 Mar 2025), and client personalization techniques (Zhang et al., 2024).

2. Core Algorithms: From Multi-Round to One-Shot Aggregation

The algorithmic foundation remains federated averaging (FedAvg), where the server distributes the global KK0, clients perform local training, and the server aggregates client-updated parameters: KK1 with KK2 obtained from local SGD/Adam runs (Wang et al., 2024).

A key advance is the realization that for large, smooth foundation models, a single communication round—each client fine-tuning all local epochs before sending one upload—produces a global model nearly indistinguishable from that obtained via multi-round FedAvg. Theoretical analysis (Theorem 1 in (Wang et al., 2024)) bounds the discrepancy as

KK3

where KK4 is the standard number of rounds and KK5 is local training. Empirically, for models KK6B parameters, the error contribution from missing inter-round aggregation (KK7) collapses, as the flatness of the loss landscape and tiny weight drift render the one-shot error negligible. This enables a reduction in communication by a factor of KK8 while preserving task accuracy, a finding supported across NLP and generative vision benchmarks (Wang et al., 2024).

More broadly, federated fine-tuning frameworks now encompass:

3. Parameter-Efficient and Modular Fine-Tuning Strategies

Due to the prohibitive resource requirements for full-model fine-tuning, federated fine-tuning has widely adopted parameter-efficient techniques:

  • Adapter/Prompt/BitFit Approaches: Only a small adapter, prompt, or bias vector is trained on each client; remaining weights are frozen. CLIP-bias and ViT-adapter can reach near-centralized performance in federated settings (Chen et al., 2022).
  • LoRA (Low-Rank Adaptation): Each trainable weight KK9 in the backbone is decomposed as DkD_k0 with DkD_k1 low-rank, separately optimized and aggregated. LoRA achieves strong efficiency gains and is now the default for large LLMs in FL (Liu et al., 2024, Ghiasvand et al., 2024).
  • Tensorized Adapters (FedTT/FedTT+): Instead of full or LoRA matrices, adapters are tensor-train (TT) decomposed, further reducing communication by up to 10DkD_k2 versus LoRA and improving robustness in non-IID splits (Ghiasvand et al., 2024).
  • Representation Fine-Tuning (FedReFT): Direct intervention layers are trained on internal activations rather than weights, with aggregation schemes like All-but-Me to control semantic drift (Siddika et al., 27 Aug 2025).
  • Mixture-of-Experts, Masking, Clustered Modules: Approaches such as FedHFT use mixture-of-masked adapters, and FedAMoLE introduces adaptive mixtures and dynamic expert assignments for heterogeneity (Ilhan et al., 15 Oct 2025, Zhang et al., 2024).
  • Proxy and Compressed FMs: Sub-model or compressed FM versions fine-tune all layers, with careful alignment (

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Fine-tuning.