Papers
Topics
Authors
Recent
2000 character limit reached

FedLEASE Framework: Adaptive Federated Learning

Updated 19 November 2025
  • FedLEASE is a federated learning framework that leverages Low-Rank Adaptation modules to address heterogeneous client data and reduce communication overhead.
  • It uses hierarchical clustering to automatically allocate adaptive LoRA experts and implements dynamic mixture-of-experts routing for personalized model fine-tuning.
  • Experimental results show FedLEASE outperforms baselines on GLUE and FLAN benchmarks with significant accuracy and generation score improvements.

FedLEASE is a federated learning framework designed to address heterogeneous client data distributions and parameter-efficient fine-tuning of large models through adaptive expert allocation and selection. Leveraging Low-Rank Adaptation (LoRA) modules, FedLEASE introduces automated clustering-based expert specialization and a per-client adaptive mixture-of-experts mechanism. This enables scalable, communication-efficient, and personalized federated fine-tuning, particularly for LLMs, across numerous diverse-client environments (Wang et al., 18 Sep 2025).

1. Motivation and Technical Problem

Federated fine-tuning for LLMs presents two fundamental challenges: (i) full model updates are infeasible due to model size and communication overhead; (ii) participant data are highly non-i.i.d., spanning different domains, tasks, or styles. Standard federated averaging paradigms (FedAvg) and single shared adapter approaches underperform, with representations propagating across incompatible client populations. LoRA restricts training to low-rank matrices ARr×dA \in \mathbb{R}^{r \times d}, BRl×rB \in \mathbb{R}^{l \times r} injected into each linear layer, sharply reducing local memory/compute and per-round communication from O(ld)\mathcal{O}(ld) to O(r(l+d))\mathcal{O}(r(l+d)) for rmin(l,d)r \ll \min(l,d).

However, using a single LoRA module for all clients leads to degraded performance under substantial heterogeneity. Assigning one expert per client negates learning generalizable subspaces and is inefficient. FedLEASE directly addresses (1) how many LoRA experts should exist and which clients should share them; and (2) which combination of experts each client should leverage during fine-tuning and inference.

2. Core FedLEASE Methodology

FedLEASE structures the federated training in two phases: initialization and iterative expert-driven communication rounds.

  • Initialization: Clients perform local LoRA pre-training; B-matrices are uploaded.
  • Clustering: The server hierarchically clusters clients by B-matrix similarity, automatically selecting the number of clusters MM using the maximum silhouette score.
  • Expert Allocation: Within each cluster, LoRA adapters (A, B) are averaged to instantiate MM experts.
  • Federated Rounds: Clients receive all MM expert adapters and a specialized router. During each local training step, the router adaptively selects the optimal subset of experts (up to MM), always including the client’s assigned expert.

The interaction of clustering, aggregated expert construction, and dynamic MoE routing distinguishes FedLEASE from prior federated parameter-efficient fine-tuning frameworks (Wang et al., 18 Sep 2025).

3. Expert Clustering and Allocation Procedure

After EE epochs of local warm-up, each client ii privately transmits the flattened BilB_i^l for each layer lLl \in L to the server. The pairwise distance is defined:

d(i,j)=1LlL[1cos(Bil,Bjl)],cos(u,v)=uvuvd(i, j) = \frac{1}{|L|} \sum_{l \in L} [1 - \cos(B_i^l, B_j^l)], \quad \cos(u,v) = \frac{u \cdot v}{\|u\|\|v\|}

Hierarchical agglomerative clustering is run for each candidate k{2,,Mmax}k \in \{2, \ldots, M_{max}\}, forming clusters {C1k,...,Ckk}\{ C_1^k, ..., C_k^k \}. The average silhouette score

S(k)=1Ni=1Nsik,sik=bikaikmax(aik,bik)S(k) = \frac{1}{N} \sum_{i=1}^N s_i^k, \qquad s_i^k = \frac{b_i^k - a_i^k}{\max(a_i^k, b_i^k)}

selects M=argmaxkS(k)M = \arg\max_k S(k), where aika_i^k and bikb_i^k are average intra-cluster and nearest inter-cluster distances.

Within each cluster CjC_j, the expert LoRA parameters are mean-aggregated:

Ajexpert=1CjiCjAi,Bjexpert=1CjiCjBiA_j^{\text{expert}} = \frac{1}{|C_j|} \sum_{i \in C_j} A_i, \quad B_j^{\text{expert}} = \frac{1}{|C_j|} \sum_{i \in C_j} B_i

This joint optimization gracefully reconciles the tradeoff between specialization and generalization: clients with similar adaptation profiles share an expert, while intra-cluster diversity is minimized.

4. Adaptive Top-M Mixture-of-Experts Routing

FedLEASE replaces hand-engineered or static expert selection with a per-client router GiR(2M1)×dG_i \in \mathbb{R}^{(2M-1) \times d}, which processes input xx to produce $2M-1$ logits, ω^\hat{\omega}, via softmax:

  • Indices 0,,M10,\ldots,M-1: always include the client’s matched expert.
  • Indices M,,2M2M,\ldots,2M-2: correspond (in deterministic order) to other experts.

The router selects the top MM entries in ω^\hat{\omega}, ensuring the assigned expert is always present. For forward computation:

y=W0x+pTopK(ω^,M)ω^p{BjexpertAjexpertxp<M BpM+1expertApM+1expertxpMy = W_0 x + \sum_{p \in \text{TopK}(\hat{\omega}, M)} \hat{\omega}_p \cdot \begin{cases} B_j^{\text{expert}}A_j^{\text{expert}}x & p < M \ B_{p-M+1}^{\text{expert}}A_{p-M+1}^{\text{expert}}x & p \geq M \end{cases}

where jj is the client’s cluster/expert. The routing mechanism is fully differentiable and learns the client-specific optimal cardinality and assignment of experts.

5. Federated Optimization and Communication Protocol

The iterative training protocol:

  • Each client receives all expert adapters and the router.
  • Only the assigned expert and router are updated locally per client.
  • After each round, local adapter/router updates are communicated.
  • Server aggregates adapters and routers within each cluster to update global experts/routers.

Only the low-rank adapter weights and router parameters are transferred, minimizing communication compared to transmitting full LLM parameters. Under usual smoothness and strong convexity assumptions and bounded cluster divergence, convergence to a stationary point neighborhood can be established (see (Wang et al., 18 Sep 2025), Appendix C).

6. Experimental Evaluation and Comparative Analysis

FedLEASE was validated on both natural language understanding (NLU; GLUE suite) and generation (NLG; FLAN tasks) across RoBERTa-Large (16 clients) and LLaMA-2-7B (8 clients, 8-bit quantized). Baseline comparisons included FedIT (FedAvg+single LoRA), FFA-LoRA, FedDPA, FedSA, and IFCA+LoRA.

  • GLUE (mean accuracy): FedLEASE achieved 87.76%, exceeding the best baseline by 3.16 percentage points.
  • FLAN (generation score): FedLEASE reached 61.70, with a margin of +1.50 over all alternatives.

Ablation studies confirmed:

  • Automated clustering with M=4M=4 experts yields optimal NLU transfer.
  • Cluster-level router averaging surpasses per-client routers.
  • Adaptive top-MM routing consistently outperforms fixed kk MoE selection.

FedLEASE remained robust to varying local epochs, LoRA rank, number of clients, data heterogeneity, and expert upper bound MmaxM_{max}.

Benchmark Best Baseline (%) FedLEASE (%) Margin
GLUE 84.60 87.76 +3.16
FLAN Score 60.20 61.70 +1.50

7. Limitations and Future Directions

FedLEASE’s initial clustering is static; clients joining, leaving, or shifting data distributions may warrant dynamic re-clustering, meta-routing, or non-hierarchical expert assignment. While LoRA provides scalability, extension to other parameter-efficient fine-tuning techniques, such as adapters or prefix-tuning, is straightforward. Multi-modal generalizations are anticipated.

A plausible implication is that automated cluster-based adapter sharing and adaptive routing architectures will remain a subject of active research, with dynamic specialization and communication adaptivity critical for federated personalization at scale (Wang et al., 18 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to FedLEASE Framework.