One-Shot Federated Learning (OSFL)

Updated 12 January 2026

One-Shot Federated Learning is a paradigm that collapses iterative client–server exchanges into a single round to drastically reduce communication costs and mitigate privacy risks.
It leverages minimal surrogates—such as local models, feature representations, or synthetic data—allowing a central server to synthesize a robust global model without sharing raw data.
OSFL employs advanced techniques like Bayesian fusion, ensemble distillation, and diffusion-based generative modeling to effectively address non-IID challenges and client heterogeneity.

One-Shot Federated Learning (OSFL) is a federated learning paradigm that collapses the standard, multi-round client–server communication protocol into a single round, aiming to drastically reduce communication cost, minimize privacy risk, and enable scalable collaborative learning in heterogeneous, non-IID environments. In OSFL, each client transmits a minimal set of surrogates (such as models, features, or generative summaries) in a single upload, from which the central server synthesizes a global model without direct access to any private raw data or labels. Recent advances have overcome fundamental obstacles in OSFL’s early parameter-aggregation and distillation methods by integrating feature-aware generative modeling, sophisticated ensemble protocols, and novel optimization frameworks. These developments have positioned OSFL as the enabling technology for federated learning in settings where resource, privacy, or regulatory constraints preclude iterative collaboration.

1. Fundamental Principles and Communication Protocol

The OSFL paradigm can be formally described as follows: Let $K$ clients each hold a private dataset $\mathcal{D}_k$ drawn from local distributions $P_k$ . The objective is to produce a global model $w^*$ (or model family $\mathcal{H}_S$ ) that minimizes generalization error on the mixture $\sum_k p_k P_k$ subject to the constraint that each client communicates with the server only once. The OSFL protocol is characterized by:

Single-round transmission: Each client trains a local model (often to completion), optionally generates a surrogate (e.g., synthetic data or compressed feature representations), and uploads it once. The server aggregates these to produce a global solution.
No raw data exchange: Direct transmission of raw examples is precluded; only models, features, or controlled generative surrogates are sent.
Hybrid aggregation strategies: The server may ensemble local predictors, perform Bayesian posterior inference, distill knowledge via synthetic data, or fuse generative representations.

Mathematically, the local objective is $w_k^* = \arg\min_w F_k(w)$ , with $F_k(w) = \frac{1}{n_k}\sum_{i=1}^{n_k}\ell(x_{k,i}^T w, y_{k,i}) + \frac{\lambda}{2}\|w\|^2$ ; the global model is (trivially) the average in the homogeneous setting, or the result of more complex fusion algorithms under model or data heterogeneity (Guha et al., 2019, Li et al., 2020, Talpini et al., 19 Mar 2025).

2. Algorithmic Taxonomy and Representative Frameworks

OSFL frameworks can be classified based on how clients transmit knowledge and how the server performs global aggregation.

Parameter Aggregation and Bayesian Posterior Fusion: Early protocols (e.g., FedAvg, Task Arithmetic) aggregate local model parameters directly. Bayesian schemes such as FedBEns and FedLPA perform (mixture-of-)Laplace posterior fusion, weighting each client’s solution by local curvature to better capture multi-modal and non-i.i.d. effects (Liu et al., 2023, Talpini et al., 19 Mar 2025).
Knowledge Distillation via Ensembles and Synthetic Data: Approaches such as DENSE, Co-Boosting, and FALCON employ data-free ensemble distillation. Clients upload local models; the server generates an ensemble teacher and distills to a global student using synthetic data produced either through adversarial generation, knowledge-guided sampling, or hierarchical feature modeling (Dai et al., 2024, Liu et al., 7 Jan 2026).
Generative and Diffusion Model-Based OSFL: Modern protocols leverage powerful generative models. Clients upload compressed representations (e.g., prototype embeddings, token sequences, learned descriptions) which condition a server-side diffusion or flow-based generator to synthesize surrogate datasets for global model training. Notable examples include FedDEO (description-guided diffusion), OSCAR (classifier-free diffusion with CLIP/BLIP), FedLMG (local classifier-guided diffusion), FedBiP (bi-level personalized latent diffusion), and feature-level rectified flows (Yang et al., 2024, Zaland et al., 12 Feb 2025, Yang et al., 2023, Chen et al., 2024, Ma et al., 25 Jul 2025).
Causality-Informed Model Fusion and Adaptive Ensembles: FuseFL introduces progressive block-wise fusion of neural network segments, guided by causal invariance to mitigate the isolation problem of local ERM and ensure robustness to spurious correlations in heterogeneous data (Tang et al., 2024).
Personalized and Clustered OSFL: Clustered and bi-level optimization (e.g., FedBiCross) cluster clients by model output similarity, generate personalized synthetic data via trajectory-based inversion, and learn adaptive, cross-cluster distillation weights for optimal fusion in highly non-IID settings (Xia et al., 5 Jan 2026).

The following table summarizes a selection of representative OSFL frameworks, their core client-side transmission, and server aggregation strategy:

Method	Client-to-Server Upload	Server Aggregation Type
FedBEns	Multiple model, Hessian pairs	Bayesian mixture-of-Laplaces fusion
DENSE, Co-Boost	Local models	Synthetic-data distillation ensemble
FALCON	Synthetic tokens, local class.	Hierarchical distillation
FedDEO	Description vectors	Conditional diffusion synthesis
OSCAR	CLIP/BLIP prototypes	Classifier-free diffusion sampling
FedLMG	Model, BN stats	Classifier-guided diffusion
FedBiP	Noised latents + token vectors	Bi-level diffusion personalization
FuseFL	Sub-network blocks	Progressive blockwise fusion

3. Addressing Data Heterogeneity and Non-IID Challenges

A critical barrier for OSFL is aggregation under heterogeneity—both statistical (non-IID distributions) and architectural (model differences). Standard parameter averaging is provably suboptimal under strong heterogeneity, leading to performance far from the centralized optimum (Liu et al., 13 Feb 2025, Amato et al., 5 May 2025). Contemporary frameworks address this in several ways:

Curvature-aware fusion: Bayesian aggregations (FedBEns, FedLPA) weigh client updates by local curvature, mitigating the dominance of overfitted or outlier clients in skewed regimes (Liu et al., 2023, Talpini et al., 19 Mar 2025).
Ensemble partition and clustering: FedBiCross addresses inter-client conflict by clustering clients with similar model outputs and performing bi-level cross-cluster optimization with adaptive weights (Xia et al., 5 Jan 2026).
Generative modeling of client distributions: Diffusion-based methods condition the synthetic generation on client-level embeddings (FedDEO, OSCAR, FedBiP), allowing the server to match the diversity of true client distributions without relying on public data or exact feature alignment (Yang et al., 2024, Zaland et al., 12 Feb 2025, Chen et al., 2024).
Causal feature alignment and progressive fusion: FuseFL prevents models from fitting to client-specific spurious features by augmenting intermediate representations and fusing blocks across clients in a causality-robust manner (Tang et al., 2024).
Personalized aggregation: Some frameworks (FedBiCross, FedBiP, FALCON) support post-aggregation personalization, enabling local adaptation to each client’s distribution.

Empirical results across standard vision and medical domains (e.g., Tuberculosis X-ray, PACS, OfficeHome, DermaMNIST) demonstrate that advanced OSFL algorithms frequently outperform naive baselines (FedAvg-1, vanilla aggregation), closing up to 9–30 percentage-point gaps relative to multi-round or centralized references (Liu et al., 7 Jan 2026, Xia et al., 5 Jan 2026).

4. Privacy, Communication Efficiency, and Practical Constraints

OSFL prioritizes privacy and resource minimization. The elimination of iterative server–client exchanges directly limits surface area for adversarial attacks, and the most advanced methods further reduce exposure:

Privacy: Generative surrogates and feature-level representations reveal significantly less information than raw data; inversion attacks on synthetic tokens or noised latents yield low-fidelity reconstructions (e.g., PSNR ≈ 13 dB, SSIM ≈ 0.15 for FALCON synthetic tokens) (Liu et al., 7 Jan 2026). Personalized diffusion models can further reduce membership inference risks substantially (Chen et al., 2024, Ma et al., 25 Jul 2025).
Communication cost: Diffusion frameworks such as OSCAR can compress the per-client upload to less than 1% of traditional OSFL, using only class-wise embeddings (e.g., 0.03M params vs. 4–11M for prior approaches) (Zaland et al., 12 Feb 2025). FALCON allows clients to choose to upload only the compact synthetic sequences or the generator, keeping uploads as low as 15 KB per synthetic sample (Liu et al., 7 Jan 2026).
Compute: Modern token- or feature-level generation (e.g., FALCON, FG-RF) offers several orders of magnitude lower inference cost compared to pixel-level or diffusion generator baselines (e.g., ~0.46 GFLOPs/sample for FALCON vs. 7,600 GFLOPs/sample for FedLMG) (Liu et al., 7 Jan 2026).
Model heterogeneity: Systems such as Co-Boosting and FedLMG allow client heterogeneity by decoupling the knowledge transfer process from architectural specifics (Dai et al., 2024, Yang et al., 2023).

5. Theoretical Foundations and Convergence Guarantees

Theoretical analysis of OSFL lags practical progress but is advancing. Foundational results provide performance bounds and clarify when and how one-shot protocols approach centralized baselines:

Averaging bounds: Naive parameter averaging yields error $O(N^{-1/2} + m/N)$ , which is suboptimal in the presence of heterogeneity (Guha et al., 2019).
Posterior fusion: Bayesian and Fisher-weighted aggregation reduce this error by matching local posterior curvature (Talpini et al., 19 Mar 2025, Liu et al., 2023).
Heterogeneity gap: Recent OSFL theory decomposes suboptimality into terms capturing data and training heterogeneity, irreducible even for large $K$ in one-shot (Tao et al., 2024).
KL-divergence bounds: For diffusion-based surrogate generation, conditional synthetic data quality can be bounded by the description’s negative log-likelihood over the local distribution and the overlap with the pre-trained model prior (Yang et al., 2024).
Causal analysis: Causal feature fusion (FuseFL) leverages the information bottleneck framework to provably diminish the mutual information between representations and spurious features, robustifying global models to data variation and corruptions (Tang et al., 2024).

6. Benchmarking Results and Empirical Outcomes

Systematic experiments across medical imaging, domain-shifted natural images, and generic federated benchmarks substantiate the quantitative advantages of modern OSFL. Key highlights include:

FALCON: Achieves 85.92% average test accuracy, outperforming the best OSFL baselines by 9.58% on non-IID vision and medical tasks, and sustaining robustness under high label heterogeneity (Liu et al., 7 Jan 2026).
FedBiCross: Delivers 85.57% (BloodMNIST, Dir $\alpha$ =0.1, K=4), up to +30% over competing personalized and ensemble FL methods (Xia et al., 5 Jan 2026).
FedDEO: Surpasses both diffusion- and ensemble-based baselines (FedDISC, FGL, FedCADO) by up to 8 percentage points and occasionally exceeds the performance of centralized reference models, especially on diversity-intensive, open-world datasets (Yang et al., 2024).
OSCAR: Outperforms prior classifier-guided diffusion OSFL protocols while reducing client communication load by at least 99% (Zaland et al., 12 Feb 2025).
FG-RF+DLKD: On non-IID medical imaging, improves over the best one-shot and multi-round FL baselines by 14–31% and yields lower privacy leakage than pixel-level synthetic images (Ma et al., 25 Jul 2025).

Ablation studies confirm the additive value of hierarchical tokenization (FALCON), co-boosted ensemble refinement (Co-Boosting), bi-level personalization (FedBiP), and intermediate feature fusion (FuseFL).

7. Open Problems and Future Directions

Despite these advances, OSFL remains an active research area with several unresolved challenges:

Scalability: As client count and model complexity grow, the cost of synthetic data generation and ensemble distillation can increase superlinearly (Amato et al., 5 May 2025).
Generalization under extreme non-IID: While the latest generative and clustering frameworks mitigate basic heterogeneity, fully closing the gap with iterative FL on long-tail, multi-modal, or adversarial splits is an open challenge (Liu et al., 13 Feb 2025, Amato et al., 5 May 2025).
Privacy amplification: Formal differential privacy for upload surrogates and robust analysis of adversarial attacks on synthetic representations are subjects of ongoing work (Liu et al., 2023, Liu et al., 13 Feb 2025).
Integration with LLM and Web 3.0 ecosystems: As FL is adopted for generative models (LLMs, multimodal agents), OSFL protocols must extend to transformer-scale architectures, incentive mechanisms (e.g., tokenized rewards, LOO values in OFL-W3), and decentralized storage/blockchain settings (Jiang et al., 2024, Liu et al., 13 Feb 2025).
Fast/efficient diffusion and multimodal extension: Reducing reverse-diffusion latency and extending paradigms to segmentation, detection, and non-vision modalities (NLP, speech) remains underexplored (Zaland et al., 12 Feb 2025, Ma et al., 25 Jul 2025).
Benchmarking and reproducibility: Standardized evaluation protocols for OSFL, with shared datasets, architectural references, and privacy/utility metrics, are still lacking (Amato et al., 5 May 2025).

The OSFL paradigm, through continual integration of hierarchical generative feature modeling, adaptive ensemble distillation, and privacy-aware aggregation strategies, has established itself as a highly practical solution for collaborative machine learning in bandwidth- and privacy-constrained settings, with demonstrated empirical progress across challenging non-IID benchmarks and strong potential for further impact in emerging federated and decentralized AI deployments (Liu et al., 7 Jan 2026, Xia et al., 5 Jan 2026, Yang et al., 2024, Zaland et al., 12 Feb 2025, Liu et al., 2023, Talpini et al., 19 Mar 2025, Tang et al., 2024, Amato et al., 5 May 2025).