Zero-shot Decentralized Federated Learning

Updated 2 October 2025

ZeroDFL is a decentralized learning framework that fuses zero-shot generalization with prompt-based adaptation to eliminate the need for a central server.
It employs efficient, low-dimensional prompt exchanges in a peer-to-peer manner, significantly reducing communication overhead compared to traditional methods.
The framework facilitates scalable, privacy-preserving adaptation of large pre-trained models in heterogeneous, geo-distributed environments, achieving competitive zero-shot accuracy.

Zero-shot Decentralized Federated Learning (ZeroDFL) is a class of collaborative learning frameworks that combine the ability of modern models to generalize to unseen tasks or classes without retraining (“zero-shot learning”) with decentralized federated architectures that eschew a central server. The ZeroDFL paradigm is characterized by a fully peer-to-peer protocol, efficient knowledge sharing (frequently based on compact text prompts or model outputs), and minimal communication overhead. It targets scalable, privacy-preserving adaptation of large pre-trained models (e.g., CLIP) in geo-distributed, highly heterogeneous environments (Masano et al., 30 Sep 2025).

1. Definition and Core Principles

ZeroDFL fundamentally addresses two challenges: decentralization and zero-shot adaptation. Unlike classical federated prompt learning (e.g., FedCoOp, FedTPG) which requires a central coordinator to aggregate or average learned prompts, ZeroDFL eliminates the central server entirely. All clients collaboratively refine and exchange knowledge—most prominently, learnable textual prompts appended to pre-trained model tokens—through direct peer-to-peer interaction (Masano et al., 30 Sep 2025).

Core principles include:

Fully decentralized operation: No aggregation through a central server; clients communicate and integrate information exclusively through distributed, peer-to-peer protocols.
Prompt-based adaptation: Each client learns soft prompts to condition a fixed, pre-trained model (e.g., CLIP) for local tasks and classes; these prompts encode the adaptation signal for zero-shot inference.
Iterative, distributed prompt exchange: After local adaptation, each client transmits its learned prompts to a small, stochastically selected subset of other clients, guided by communication history to ensure even propagation of knowledge.
Communication efficiency: By exchanging only low-dimensional prompt vectors instead of full model weights or local gradient information, ZeroDFL reduces total communication volume by over two orders of magnitude compared to centralized prompt generators (Masano et al., 30 Sep 2025).

2. Prompt Learning and Model Formulation

In ZeroDFL, each client maintains a set of learnable prompt vectors (typically, M soft tokens of dimension d, e.g., M = 4, d = 512 for CLIP). These vectors are concatenated with the tokenized class name to form class-specific prompts:

$P_q = [v_{i,1}, ..., v_{i,M}, \epsilon(t_{i,q})],$

where $v_{i,k}$ is the k-th learned prompt of client $c_i$ and $\epsilon(\cdot)$ is the tokenizer for the class name $t_{i,q}$ . Class probabilities for input $x$ are then computed via the frozen CLIP visual and text encoders $f(\cdot), g(\cdot)$ :

$p(\hat{y} = q | x) = \frac{\exp(\phi(f(x), g(P_q))/\tau)}{\sum_h \exp(\phi(f(x), g(P_h))/\tau)}$

with similarity function $\phi$ (e.g., cosine similarity) and temperature $\tau$ .

Each client minimizes the negative log-likelihood loss on its private dataset:

$\mathcal{L} = -\sum_{(x, y)} \log p(\hat{y} = y | x)$

This structure enables adaptation for zero-shot classification without modifying the base CLIP encoders, while the prompt vectors encode local generalization knowledge (Masano et al., 30 Sep 2025).

The protocol operates in iterative rounds, each comprising two key phases:

Local prompt adaptation: Clients draw M prompts from a local prompt pool (consisting of previously received prompts and their own current prompts). Using local data, each client fine-tunes its prompts via prompt learning.
Peer selection and prompt distribution: After adaptation, each client selects S recipient clients using a probability distribution that favors peers who have been contacted less frequently, using

$w_{i,j}^r = \frac{1}{F_{i,j}^r + \epsilon}, \quad \pi_i(c_j) = \frac{w_{i,j}^r}{\sum_{k \neq i} w_{i,k}^r}$

where $F_{i,j}^r$ counts how often client $c_j$ has received prompts from client $c_i$ by round $r$ , and $\epsilon$ is a smoothing constant.

This strategy ensures prompt knowledge is evenly propagated, facilitating a global mixing of learned adaptations and preventing communication bottlenecks or isolated subgroups (Masano et al., 30 Sep 2025).

Communication Efficiency Table

Method	Centralized	Decentralized	Communication Overhead
FedCoOp	Yes	No	High (full prompt averaging)
FedTPG	Yes	No	Very high (prompt generator)
ZeroDFL	No	Yes	Low (compact prompt sharing)

When exchanging M=4, d=512 prompts with S=5 recipients per round over 500 rounds, ZeroDFL achieved up to 118× lower bandwidth than FedTPG.

4. Empirical Results and Performance Characteristics

ZeroDFL was evaluated on nine image classification benchmarks (Caltech101, Flowers102, Oxford Pets, FGVC Aircraft, DTD, UCF101, Food101, Stanford Cars, SUN397), under both heterogeneous settings (different datasets assigned to different clients) and homogeneous ones (classes from a shared dataset partitioned across clients).

In the heterogeneous scenario, average zero-shot accuracy reached 76.19%, matching or slightly exceeding FedTPG (76.02%) and clearly outperforming FedCoOp variants on most benchmarks.
As federation size increases (augmentation of the number of clients, each with fewer classes), ZeroDFL maintained stable and robust performance, demonstrating scalability.
In homogeneous setups, ZeroDFL outperformed FedTPG by almost five percentage points on average.

All of these performance gains were achieved with drastically reduced network communication, and without reliance on any centralized entity or global prompt generator (Masano et al., 30 Sep 2025).

5. Privacy, Scalability, and Real-World Applicability

ZeroDFL is tailored for privacy-critical and bandwidth-constrained environments. Only prompt vectors (text embeddings encoding adaptation signals) are exchanged; no raw data, no model weights, and no personally revealing information is ever transmitted to a central authority or outside a client’s trusted peer set.

Example applications explicitly mentioned include:

Edge intelligence and IoT systems, where data aggregation is infeasible.
Healthcare and finance sectors, where regulatory or ethical constraints block centralization.
Large-scale multimedia or smart city networks, where decentralized, low-bandwidth collaborative zero-shot adaptation is operationally necessary.

By removing the need for a central aggregator, ZeroDFL is inherently resilient to single-point failures, bandwidth bottlenecks, and privacy leakage risks associated with classical FL architectures (Masano et al., 30 Sep 2025).

6. Future Directions and Open Research Challenges

Several avenues for further improvement in ZeroDFL were suggested:

Dynamic prompt exchange: Adapting the number of shared prompts per round ( $M_s$ ) based on task complexity and bandwidth constraints.
Personalized versus global prompts: Investigating architectures allowing clients to maintain a hybrid set of personalized (private) prompts and shared (global) prompts to better balance personalization and generalization.
Advanced communication scheduling: Further optimizing peer selection strategies to minimize redundant exchanges and accelerate convergence.
Robustness to adversarial manipulation: Reducing shared prompt information content to bolster security in adversarial environments.

A plausible implication is that advances in efficient peer sampling and robust aggregation protocols could further accelerate convergence or enhance resilience in even larger, more heterogeneous federations.

7. Comparative Landscape and Distinguishing Features

ZeroDFL advances beyond prior federated prompt learning methods as summarized below:

Approach	Central Coordinator	Prompt Type	Aggregation Method	Generalization	Communication
FedCoOp	Yes	Static Soft Prompts	FedAvg	Moderate	High
FedTPG	Yes	Local Generator	Central Aggregation	High	Very High
ZeroDFL	No	Shared Soft Prompts	P2P Iterative Sharing	High	Very Low

ZeroDFL’s distinguishing characteristics are full decentralization, order-of-magnitude communication efficiency, and robust zero-shot generalization verified on diverse, non-i.i.d. settings. This approach establishes a new operational regime for scalable, adaptive federated learning in the era of large multi-modal models (Masano et al., 30 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

Zero-Shot Decentralized Federated Learning (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Zero-shot Decentralized Federated Learning (ZeroDFL).