Prompt-Based Federated Learning

Updated 26 March 2026

Prompt-based federated learning is a paradigm that adapts frozen foundation models using lightweight prompt tuning to enhance privacy and efficiency in collaborative settings.
It reduces communication overhead by updating only small prompt parameters, enabling resource-efficient personalization and robust performance on heterogeneous data.
Empirical studies show that methods like FedPGP balance global model robustness with local adaptation, achieving high accuracy on diverse benchmarks.

Prompt-Based Federated Learning (PBFL) is a paradigm that applies prompt learning—parameter-efficient adaptation of large pretrained models—within federated learning (FL) systems, enabling clients to collaboratively tailor frozen foundation models (typically vision, language, or vision-LLMs) by updating only small prompt parameters rather than full model weights. This approach is motivated by the need for computational, communication, and privacy efficiency, particularly in heterogeneous, resource-constrained, and privacy-sensitive settings. PBFL leverages the strong generalization of large models while allowing for data- and domain-adaptive personalization via prompts, and is rapidly establishing itself as a leading methodology for collaborative adaptation of foundation models.

1. Core Principles and System Architecture

At its core, PBFL freezes a pretrained foundation model (e.g., CLIP, RoBERTa, Transformer-based weather model) on all clients and trains only a small set of prompt parameters in a federated manner (Guo et al., 2022, Cui et al., 2024, Liao et al., 28 Mar 2025, Gong et al., 2024, Chen et al., 2023). The standard workflow is:

Initialization: The server maintains a global prompt (or prompt-generator parameters), which is broadcast to selected clients at each communication round.
Local Update: Clients perform task-specific prompt updates via gradient descent using local data, with the backbone kept frozen to maximally exploit pretrained generalization and minimize local computation.
Aggregation: Clients upload their updated prompt parameters (not model weights, activations, or gradients) to the server. The server aggregates (typically via FedAvg) to form a new global prompt.
Personalization (optional): Many advanced variants (e.g., pFedMoAP (Luo et al., 2024), FedPGP (Cui et al., 2024), FedMGP (Bo et al., 1 Nov 2025), SDFed (Di et al., 9 Feb 2026)) introduce local, low-rank, or mixture-of-expert prompt terms to allow per-client adaptation while preserving a global generalization backbone.

PBFL supports both text and visual prompts depending on the foundation model architecture (2505.23024, Gong et al., 2024). The prompt representation can be a learnable embedding (prepending context tokens for LLMs, or patch/border tokens for visual models), a specialized prompt-generator network (Qiu et al., 2023, Prasad et al., 17 Aug 2025), or a combination thereof.

2. Mathematical Formalisms and Optimization Objectives

Across PBFL, the learning objective is to minimize a weighted average of local, prompt-parameterized losses:

$\min_{\theta \in \mathbb{R}^{m \times d}} F(\theta) = \sum_{i=1}^C \frac{n_i}{N} F_i(\theta),$

where $F_i(\theta)$ is the per-client prompt-tuning loss (e.g., cross-entropy, contrastive, or task-specific), $\theta$ represents one or more prompt matrices/vectors, and $n_i$ is the local data size (Liao et al., 28 Mar 2025, Guo et al., 2022).

Vision-LLMs: For CLIP, the classification probability for class $j$ given image $x$ and prompt $\theta$ is

$p(y=j \mid x; \theta) = \frac{\exp(\operatorname{sim}(E_\mathrm{image}(x), E_\mathrm{text}(t_j(\theta))) / \tau)}{\sum_{k=1}^K \exp(\operatorname{sim}(E_\mathrm{image}(x), E_\mathrm{text}(t_k(\theta))) / \tau)},$

where $t_j(\theta)$ is the prompt-augmented text sequence, and $\operatorname{sim}$ is cosine similarity (Guo et al., 2022).

Personalization and Generalization Balance: Models such as FedPGP (Cui et al., 2024) introduce a global prompt $p_G$ combined with a per-client low-rank adaptation $\Delta p_i = U_i V_i$ , yielding $p_i = p_G + \Delta p_i$ . The loss

$\mathcal{L}^i(p_G, U_i, V_i) = \mathcal{L}_{\mathrm{cls}}^i(p_G + U_i V_i) + \mu \mathcal{L}_{\mathrm{con}}^i(p_G, U_i, V_i)$

incorporates both a standard classification loss and a contrastive prompt-wise regularization to inject global knowledge and promote local specificity (Cui et al., 2024).

Federated Aggregation: Typically, only prompt parameters (or their generated updates) are aggregated (e.g. server sets

$\theta^{(t+1)} = \sum_{i \in S_t} \frac{n_i}{\sum_j n_j} \theta_i^{(t+1)},$

drastically reducing communication overhead compared to full model updates (Guo et al., 2022, Liao et al., 28 Mar 2025).

3. Design Patterns and Advanced Methodologies

PBFL has rapidly evolved into a vibrant ecosystem of algorithmic innovations:

Prompt Modalities: Both text and visual prompts are supported. Visual prompts may be padding, patch tokens, style-injection vectors, or parameterized functions over client data (Li et al., 2023, Prasad et al., 17 Aug 2025). Federation can optimize either or both modalities independently or jointly (see VLPT in (2505.23024); PLAN in (Gong et al., 2024)).
Personalization Strategies: Approaches include low-rank or local additions to a global prompt (Cui et al., 2024), multi-group paired prompt "experts" per client (Bo et al., 1 Nov 2025), oracle or data-adaptive mixtures (e.g., pFedMoAP (Luo et al., 2024)), and subspace projections for divergence control (Di et al., 9 Feb 2026).
Prompt Generators: Instead of explicit prompt vectors per class, some methods federate a prompt-generator network $G(T; \theta)$ parameterized by client/class-specific textual/task embedding (Qiu et al., 2023, Prasad et al., 17 Aug 2025). This approach generalizes to unseen classes/domains naturally and is robust to data partitioning (Qiu et al., 2023).
Federated Domain Generalization: In scenarios where clients represent distinct domains, federated prompt learning replaces sharing of raw statistics or prototypes (potential privacy risk) with indirect knowledge transfer via prompt exchange and attention-based or GAN-based prompt aggregators (Gong et al., 2024, Wu et al., 25 Sep 2025).
Mixture and Specialist Designs: Architectures such as FedMGP (Bo et al., 1 Nov 2025) distribute prompt capacity across multiple specialist groups, enforcing intra-client prompt diversity and dynamically aggregating only those most semantically aligned with the global prompt.
Communication-Efficient Protocols: Most PBFL algorithms restrict communication to lightweight prompt parameters (typ. <1 MB per round), with strategies to further sparsify or regularize updates via prompt-level similarity graphs (Chen et al., 2023), attention-based aggregation (Gong et al., 2024), or dynamic selection (Bo et al., 1 Nov 2025).
Federated Robustness and Security: Recent work (Zhang et al., 11 Aug 2025) identifies prompt-tuning as a vulnerable attack surface. Backdoor prompts can propagate through global aggregation and trigger malicious behaviors. Conventional FL defenses (e.g., DP, robust aggregation) show incomplete mitigation.

4. Empirical Findings and Benchmarks

PBFL displays strong quantitative and qualitative performance across a diverse benchmark suite, as extensively catalogued in FLIP (Liao et al., 28 Mar 2025), PLAN (Gong et al., 2024), FedPGP (Cui et al., 2024), and related works. Key empirical results include:

Method	Global Acc (%)	Personalization (%)	Base/Novel HM (%)	Comms. per Round	Robust to Non-IID
PromptFL	69.6	75.2	63.8	∼2MB	Yes
FedOTP	69.4	76.5	63.7	∼4MB	Yes
FedPGP	87.3 (HM)	91.5–99.3	81.8 (Novel)	∼1MB	Yes
FedCSAP	76.06 (HM)	–	75.61 (New)	∼1–2MB	Yes
PLAN	up to 97.4	–	–	∼1.3MB	Yes

PromptFL (Guo et al., 2022, Liao et al., 28 Mar 2025) is a strong baseline, matching full CLIP fine-tuning accuracy at less than 5% the communication cost, robust to both IID and non-IID partitions.
Personalization and Generalization Trade-off: FedPGP (Cui et al., 2024) and similar models balance local accuracy and out-of-domain performance via low-rank adaptation and contrastive regularization, with harmonic mean of local/base/novel accuracy exceeding all prior methods.
Heterogeneous Settings: SDFed (Di et al., 9 Feb 2026) shows that allowing variable-length local prompts with subspace refinement and divergence control yields consistent gains under strong heterogeneity.
Domain Generalization: PLAN (Gong et al., 2024) and FedDSPG (Wu et al., 25 Sep 2025) achieve state-of-the-art accuracy in cross-domain benchmarks by learning and aggregating both text and visual prompts using adaptive, privacy-preserving attention mechanisms.
Scalability & Efficiency: Communication and compute are reduced by one to three orders of magnitude relative to full-model FL, enabling practical on-device deployment (Chen et al., 2023, Liao et al., 28 Mar 2025).

5. Theoretical Foundations and Convergence Analysis

PBFL has been subject to rigorous theoretical analysis (Pan et al., 2024, Di et al., 9 Feb 2026):

Feature-Learning Theory: Signal learning (task-relevant prompt coefficients) versus noise memorization (prompt drift into spurious/null-space directions) governs test accuracy. The test error is determined by the ratio μ/σ of signal-to-noise in prompt components (Pan et al., 2024).
Portfolio Analogy: Optimal mixing of global and local prompts is formalized analogously to mean-variance portfolio optimization, yielding closed-form solutions for the mixing coefficient as a function of client heterogeneity (Pan et al., 2024).
Subspace Refinement (SDFed): Local prompt subspace orthogonal to the dominant global prompt ensures efficient knowledge transfer while minimizing prompt conflicts. Convergence to stationarity is guaranteed under standard FL conditions (Di et al., 9 Feb 2026).
Dynamic Aggregation (FedMGP): Softmax-weighted sampling and aggregation, guided by cosine similarity between local and global prompt banks, provably improves generalization and reduces bias compared to indiscriminate averaging (Bo et al., 1 Nov 2025).

6. Open Challenges, Controversies, and Future Directions

Although PBFL demonstrates parameter/communication efficiency and robust performance across diverse settings, several challenges and research frontiers remain:

Attack Surfaces: PBFL is vulnerable to prompt-level backdoor poisoning (Zhang et al., 11 Aug 2025). Current defenses (DP, robust aggregation) are not fully effective and can compromise utility or fail under subtle prompt attacks.
Aggregation Sensitivity: Uniform prompt averaging can under-represent underrepresented clients or encourage prompt collapse in highly non-IID settings. Adaptive aggregation (attention, similarity graphs, dynamic sampling) mitigates but does not eliminate these issues.
Task Generalization: Most work targets classification; extending prompt FL to detection, segmentation, structured prediction, multi-modal generation, or even reinforcement learning is in early stages.
Prompt Structure and Scalability: The trade-off between prompt length/capacity and expressivity versus compute/memory is not fully understood; dynamic prompt sizing and hierarchical prompt banks are active research areas.
Privacy Guarantees: While prompt-based protocols reduce the exposure of sensitive feature/statistics, formal end-to-end privacy guarantees and the integration of DP/secure aggregation into PBFL pipelines remain open challenges.
Theoretical Understanding: Precise convergence rates, generalization bounds under client heterogeneity, and prompt optimization landscapes continue to be developed.

7. Practical Implementation and Deployment Considerations

PBFL is well-suited to edge-device and privacy-sensitive deployments. Key practical guidelines extracted from FLIP (Liao et al., 28 Mar 2025)>:

Use one to two prompts of length four to eight for a strong efficiency/performance trade-off.
Prompt learning converges in 1/4–1/10 as many rounds as full-model FL; mixed-precision acceleration and on-device adaptation are feasible.
On-device weather forecasting (Chen et al., 2023), medical visual QA (Zhu et al., 2024), and graph learning (Guo et al., 2024) benefit significantly from task-specific prompt formats and personalized aggregation strategies.
When domain-shifts or label-skew dominate, customizing prompt modality (visual vs. text), aggregation protocol, and personalization degree (hybrid global–local) is essential for maximizing transferable performance (2505.23024, Cui et al., 2024).

PBFL is establishing itself as a general, efficient, and effective framework for federated adaptation of large foundation models across vision, language, multi-modal, time-series, and specialized domains, with a rapidly expanding body of theoretical, empirical, and systems-level research underpinning its development (Liao et al., 28 Mar 2025).