FedLEASE Framework: Adaptive Federated Learning
- FedLEASE is a federated learning framework that leverages Low-Rank Adaptation modules to address heterogeneous client data and reduce communication overhead.
- It uses hierarchical clustering to automatically allocate adaptive LoRA experts and implements dynamic mixture-of-experts routing for personalized model fine-tuning.
- Experimental results show FedLEASE outperforms baselines on GLUE and FLAN benchmarks with significant accuracy and generation score improvements.
FedLEASE is a federated learning framework designed to address heterogeneous client data distributions and parameter-efficient fine-tuning of large models through adaptive expert allocation and selection. Leveraging Low-Rank Adaptation (LoRA) modules, FedLEASE introduces automated clustering-based expert specialization and a per-client adaptive mixture-of-experts mechanism. This enables scalable, communication-efficient, and personalized federated fine-tuning, particularly for LLMs, across numerous diverse-client environments (Wang et al., 18 Sep 2025).
1. Motivation and Technical Problem
Federated fine-tuning for LLMs presents two fundamental challenges: (i) full model updates are infeasible due to model size and communication overhead; (ii) participant data are highly non-i.i.d., spanning different domains, tasks, or styles. Standard federated averaging paradigms (FedAvg) and single shared adapter approaches underperform, with representations propagating across incompatible client populations. LoRA restricts training to low-rank matrices , injected into each linear layer, sharply reducing local memory/compute and per-round communication from to for .
However, using a single LoRA module for all clients leads to degraded performance under substantial heterogeneity. Assigning one expert per client negates learning generalizable subspaces and is inefficient. FedLEASE directly addresses (1) how many LoRA experts should exist and which clients should share them; and (2) which combination of experts each client should leverage during fine-tuning and inference.
2. Core FedLEASE Methodology
FedLEASE structures the federated training in two phases: initialization and iterative expert-driven communication rounds.
- Initialization: Clients perform local LoRA pre-training; B-matrices are uploaded.
- Clustering: The server hierarchically clusters clients by B-matrix similarity, automatically selecting the number of clusters using the maximum silhouette score.
- Expert Allocation: Within each cluster, LoRA adapters (A, B) are averaged to instantiate experts.
- Federated Rounds: Clients receive all expert adapters and a specialized router. During each local training step, the router adaptively selects the optimal subset of experts (up to ), always including the client’s assigned expert.
The interaction of clustering, aggregated expert construction, and dynamic MoE routing distinguishes FedLEASE from prior federated parameter-efficient fine-tuning frameworks (Wang et al., 18 Sep 2025).
3. Expert Clustering and Allocation Procedure
After epochs of local warm-up, each client privately transmits the flattened for each layer to the server. The pairwise distance is defined:
Hierarchical agglomerative clustering is run for each candidate , forming clusters . The average silhouette score
selects , where and are average intra-cluster and nearest inter-cluster distances.
Within each cluster , the expert LoRA parameters are mean-aggregated:
This joint optimization gracefully reconciles the tradeoff between specialization and generalization: clients with similar adaptation profiles share an expert, while intra-cluster diversity is minimized.
4. Adaptive Top-M Mixture-of-Experts Routing
FedLEASE replaces hand-engineered or static expert selection with a per-client router , which processes input to produce $2M-1$ logits, , via softmax:
- Indices : always include the client’s matched expert.
- Indices : correspond (in deterministic order) to other experts.
The router selects the top entries in , ensuring the assigned expert is always present. For forward computation:
where is the client’s cluster/expert. The routing mechanism is fully differentiable and learns the client-specific optimal cardinality and assignment of experts.
5. Federated Optimization and Communication Protocol
The iterative training protocol:
- Each client receives all expert adapters and the router.
- Only the assigned expert and router are updated locally per client.
- After each round, local adapter/router updates are communicated.
- Server aggregates adapters and routers within each cluster to update global experts/routers.
Only the low-rank adapter weights and router parameters are transferred, minimizing communication compared to transmitting full LLM parameters. Under usual smoothness and strong convexity assumptions and bounded cluster divergence, convergence to a stationary point neighborhood can be established (see (Wang et al., 18 Sep 2025), Appendix C).
6. Experimental Evaluation and Comparative Analysis
FedLEASE was validated on both natural language understanding (NLU; GLUE suite) and generation (NLG; FLAN tasks) across RoBERTa-Large (16 clients) and LLaMA-2-7B (8 clients, 8-bit quantized). Baseline comparisons included FedIT (FedAvg+single LoRA), FFA-LoRA, FedDPA, FedSA, and IFCA+LoRA.
- GLUE (mean accuracy): FedLEASE achieved 87.76%, exceeding the best baseline by 3.16 percentage points.
- FLAN (generation score): FedLEASE reached 61.70, with a margin of +1.50 over all alternatives.
Ablation studies confirmed:
- Automated clustering with experts yields optimal NLU transfer.
- Cluster-level router averaging surpasses per-client routers.
- Adaptive top- routing consistently outperforms fixed MoE selection.
FedLEASE remained robust to varying local epochs, LoRA rank, number of clients, data heterogeneity, and expert upper bound .
| Benchmark | Best Baseline (%) | FedLEASE (%) | Margin |
|---|---|---|---|
| GLUE | 84.60 | 87.76 | +3.16 |
| FLAN Score | 60.20 | 61.70 | +1.50 |
7. Limitations and Future Directions
FedLEASE’s initial clustering is static; clients joining, leaving, or shifting data distributions may warrant dynamic re-clustering, meta-routing, or non-hierarchical expert assignment. While LoRA provides scalability, extension to other parameter-efficient fine-tuning techniques, such as adapters or prefix-tuning, is straightforward. Multi-modal generalizations are anticipated.
A plausible implication is that automated cluster-based adapter sharing and adaptive routing architectures will remain a subject of active research, with dynamic specialization and communication adaptivity critical for federated personalization at scale (Wang et al., 18 Sep 2025).