Adapter-based Federated LLMs

Updated 31 January 2026

Adapter-based FedLLMs are methodologies that integrate small, parameter-efficient modules into frozen LLMs for efficient federated fine-tuning without full model weight sharing.
They employ various adapter architectures, such as LoRA and bottleneck adapters, to minimize local computation and communication, reducing data transmission by over 90% compared to full updates.
Empirical findings show near-centralized performance with enhanced personalization and scalability, though challenges remain in ensuring robust privacy and security.

Adapter-based Federated LLMs (FedLLMs) are a class of methodologies for privacy-preserving and computation/communication-efficient fine-tuning of LLMs in federated learning (FL) environments. These approaches integrate parameter-efficient modules—typically low-rank adapters such as LoRA—into frozen transformer backbones, enabling distributed collaborative model adaptation across heterogeneous clients without sharing raw data or transmitting full model weights. Adapter-based FedLLMs form a central paradigm in contemporary research, supporting robust personalization, scalable deployment, and compliance with privacy and resource constraints in domains ranging from generic NLP to specialized fields such as medicine.

1. Formal Setting and Motivation

Adapter-based FedLLMs address two critical barriers to federated adaptation of large pre-trained LLMs: the prohibitive communication cost of exchanging full model weights and the significant local computation/memory requirements inherent to multi-billion-parameter models. In the FL context, clients iterate between local training—updating small adapter modules inserted into a frozen global backbone—and periodic aggregation of these adapters at a central server.

Formally, let $M$ clients possess their own datasets $\mathcal{D}_k$ , and let $f_0$ denote a frozen LLM. Each client trains only its adapter parameters (e.g., low-rank matrices $A$ , $B$ or per-adapter parameters $\phi_k$ ), minimizing a local loss: $\min_{\phi_k} L_k(\phi_k) := \mathbb{E}_{(x,y)\in\mathcal{D}_k}[\ell(f_0(x) + A_l(x;\phi_k), y)]$ A global adapter $\theta$ is simultaneously fine-tuned via federated averaging (FedAvg), and clients periodically synchronize to aggregate their local adapter updates into an improved central model (Yang et al., 2024, Yao et al., 2024, Wu et al., 15 Mar 2025).

Adapter modules (e.g., LoRA, bottleneck adapters) typically account for $0.1\%$ – $1\%$ of the full model parameters, reducing per-round communication by $\mathcal{D}_k$ 0 relative to classic FedAvg. This design also confines train-time resource utilization, making federated LLM fine-tuning tractable on edge hardware or institutionally partitioned compute (Yao et al., 2024, Li et al., 29 Jan 2026).

2. Adapter Architectures and Insertion Strategies

Adapter-based FedLLMs utilize well-defined parameter-efficient fine-tuning (PEFT) modules:

Bottleneck Adapters: Two linear projections with a nonlinearity between frozen transformer sublayers. For $\mathcal{D}_k$ 1-dimensional hidden states and bottleneck width $\mathcal{D}_k$ 2, each adapter contains $\mathcal{D}_k$ 3 trainable parameters per layer (Cai et al., 2022, Wu et al., 15 Mar 2025).
LoRA Adapters: Low-rank decomposition of selected weight matrices (e.g., attention projections) as $\mathcal{D}_k$ 4, with $\mathcal{D}_k$ 5, $\mathcal{D}_k$ 6 and $\mathcal{D}_k$ 7, inserted in parallel to $\mathcal{D}_k$ 8 (Yao et al., 2024, Li et al., 29 Jan 2026, Zhou et al., 30 May 2025).
Tensor-Train Adapters: Decomposition of projection matrices into a sequence of small TT-cores for further communication reduction and factor sharing (Ghiasvand et al., 2024).
Dual/Expert Adapters: Dual adapters (global+local) (Yang et al., 2024) and architectures with mixtures of LoRA experts assigned via data-driven criteria for personalization (Zhang et al., 2024).

Typical insertion points include all transformer layers, selective attention/FFN submodules, or even only the top layers for extreme efficiency configurations (Cai et al., 2022, Yao et al., 2024, Wu et al., 2024). Recent frameworks support per-layer dynamic adapter allocation and rank selection matched to client resource budgets or data heterogeneity (Zhou et al., 30 May 2025, Zhang et al., 2024).

3. Federated Training Protocols and Aggregation

Adapter-only federated optimization is typically instantiated via variants of FedAvg:

At each round $\mathcal{D}_k$ 9, the server broadcasts the current global adapter state (or both global and frozen backbone for new clients).
Each selected client downloads global parameters, initializes local adapter(s), and runs $f_0$ 0 epochs of local SGD or Adam to minimize its partial data loss.
Upon completion, the client uploads its updated adapters. For certain methods (e.g., dual adapters or expert mixtures), only the shared/global component or a selective subset is communicated, while highly personalized parameters remain private (Yang et al., 2024, Zhang et al., 2024).

Aggregation strategies include:

Plain FedAvg (data-weighted mean of adapters) (Yao et al., 2024, Wu et al., 15 Mar 2025).
Adaptive aggregation: Eg, in Fed-MedLoRA+, client contributions are weighted by both sample count and validation loss on a held-out dataset (Li et al., 29 Jan 2026).
Singular Value Thresholding (FLoRIST): Server-side SVD on stacked LoRA updates with a tunable singular value cutoff to obtain an optimal rank- $f_0$ 1 global low-rank adapter (Ramesh et al., 10 Jun 2025).
Rank-aware aggregation: Contributions proportional to both data volume and effective adapter rank (client-specific) (Zhou et al., 30 May 2025).

Personalization is supported via local adapters or expert assignments never uploaded to the server (Yang et al., 2024, Zhang et al., 2024).

4. Handling Data Heterogeneity, Personalization, and System Heterogeneity

Federated LLM adaptation faces substantial data heterogeneity, including label skew, feature skew, and diverse annotation structure. Adapter-based FedLLMs address these through:

Dual or Heterogeneous Adapters: FedDPA (Yang et al., 2024) trains both a personalized adapter per client and a federated global adapter, enabling test-time selection/combination via a data-driven instance weighting function $f_0$ 2 computed from similarity of test examples to local data priors.
Mixtures of Adapters/Experts: FedAMoLE (Zhang et al., 2024) introduces client-specific assignment of LoRA expert adapters via a reverse selection mechanism optimizing both adaptation capacity and load balancing.
Sparse/Selective Updates: Curriculum techniques based on Fisher Information (Liu et al., 2024) select only “easy-to-learn” samples/batches early in training and adaptively update only the most informative layers or individual neurons, reducing both communication and compute.
Resource-Awareness: Algorithms such as AFLoRA (Zhou et al., 30 May 2025) dynamically prune low-importance adapter ranks or adapt aggregation weighting based on client capability, while SflLLM (Zhao et al., 20 Apr 2025) jointly optimizes split points, LoRA rank, and network resource allocation for edge deployments.

Empirically, these approaches yield +1–5% absolute improvement over plain FedAvg-LoRA under extreme heterogeneity or limited resource settings, and can match or surpass centralized fine-tuning in moderate regimes (Yang et al., 2024, Zhang et al., 2024, Ramesh et al., 10 Jun 2025, Ghiasvand et al., 2024).

5. Communication, Computation, and System Complexity

Adapters reduce per-round network and local resource demands by several orders of magnitude:

Method	Relative Param Fraction	Typical Comm/round (MB)	Typical Speed-up
Full Fine-Tuning	100%	>10,000	1×
LoRA, Adapter-Based	~0.1–1%	<100 (often <10)	10–100×
Tensor-Train Adapter	~0.02–0.06%	<2	up to 100×
FedAMoLE (Expert Mixt.)	0.01–0.5%	10–15	5–50×

These factors depend on model size, adapter configuration, and client population. For example, Fed-MedLoRA achieves ≈98.5% reduction in transmitted bytes per round—dropping from 32 GB to 168 MB (8B LLaMA)—and attains computation feasibility on consumer GPUs (14 GB for training, 6.8 GB for inference) (Li et al., 29 Jan 2026).

Adapter-based FedLLMs are fundamentally modular: clients require only frozen backbone weights and adapter updates, supporting flexible onboarding/dropout scenarios and device capability stratification (Yao et al., 2024, Cai et al., 2022, Wu et al., 2024, Zhao et al., 20 Apr 2025).

6. Empirical Performance, Personalization, and Test-Time Robustness

Extensive experiments on standard NLP (GLUE, Flan, AGNews, SNLI), medical IE (MIMIC-III, i2b2, YNHH), and instructional finetuning (Alpaca, WizardLM) benchmarks report that:

Adapter-based methods reach within 0.1–1% of full fine-tuning accuracy in classification and generation, often with as few as 2–4 communication rounds for stable domains (Yang et al., 2024, Wu et al., 15 Mar 2025, Li et al., 29 Jan 2026).
Dual-personalizing architectures (FedDPA) outperform classic FedLoRA baselines by 1–2 ROUGE/accuracy points on test-time distribution shift, and test-time dynamic mixing of local/global adapters improves out-of-domain robustness by 3–15 points (Yang et al., 2024).
Mixtures of LoRA experts with dynamic assignment (FedAMoLE) achieve improvements of +3–5% mean test accuracy per client under label/feature skew or multi-task distribution (Zhang et al., 2024).
In medical IE, Fed-MedLoRA+ outperforms single-site fine-tuning and domain-specific BERT by 10–70% absolute F1 in external validation, achieving near-centralized performance in real-world cross-institutional scenarios (Li et al., 29 Jan 2026).
Adaptive communication policies, curriculum, and rank-pruning yield an additional 70–98% speedup and effectively track optimal trade-offs under tight constraints (Liu et al., 2024, Ramesh et al., 10 Jun 2025, Zhou et al., 30 May 2025).

7. Security, Privacy, and Open Challenges

Contrary to earlier assumptions, small adapter sizes do not guarantee privacy: gradient inversion attacks (e.g., UTR attack (Chen et al., 24 Jan 2026)) can reconstruct training data with near-perfect accuracy (ROUGE-1/2 ≈ 99–100), even from low-rank adapter gradients in federated settings. Differential privacy and extreme gradient pruning only become effective at the cost of severe accuracy degradation; no parameter-efficient adapter mechanism currently provides both strong utility and provable privacy guarantees (Chen et al., 24 Jan 2026, Yao et al., 2024).

Open challenges include:

Designing aggregation and update mechanisms robust to severe heterogeneity, both in data and system resources (Zhou et al., 30 May 2025, Zhang et al., 2024, Ghiasvand et al., 2024).
Achieving privacy guarantees (differential privacy, secure aggregation) tailored to adapter structure, and defending against leakage attacks without destroying learning signal (Chen et al., 24 Jan 2026, Yao et al., 2024).
Extending adapter-based FL LLM methodology to cross-modal and highly personalized domains (e.g., vision-language, retrieval-assisted models) and real-world legal/compliance frameworks (Yao et al., 2024, Wu et al., 15 Mar 2025).
Addressing the memory wall (full LLM backbone needs to reside on client) via split learning or server-aided inference (Wu et al., 2024, Zhao et al., 20 Apr 2025).

Adapter-based FedLLMs represent a critical direction enabling scalable, resource-aware, and (potentially) privacy-preserving adaptation of LLMs in decentralized, heterogeneous, and sensitive environments, but require further advances in privacy and security to be robust for real-world deployment (Yang et al., 2024, Yao et al., 2024, Li et al., 29 Jan 2026, Chen et al., 24 Jan 2026, Wu et al., 15 Mar 2025).