Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative Federated Learning

Updated 12 March 2026
  • Generative Federated Learning is a framework that federates generative models like GANs, VAEs, and diffusion models across decentralized data to enhance privacy and personalization.
  • It facilitates the joint training and aggregation of synthetic data or model parameters to address data heterogeneity and augment scarce modalities.
  • GenFL advances robust, efficient, and privacy-preserving distributed learning while introducing challenges in algorithm design and communication efficiency.

Generative Federated Learning (GenFL) is an extension of traditional Federated Learning (FL) in which generative models (such as GANs, VAEs, or diffusion models) are distributedly trained over decentralized client data without explicit data sharing. GenFL systems enable the joint training, exchange, or aggregation of generative models and/or their outputs—synthetic data—across clients, with primary aims including privacy preservation, data heterogeneity mitigation, augmentation of scarce modalities, and personalization. The integration of generative models into FL has catalyzed advances in privacy, robustness, and adaptability across a broad range of distributed learning scenarios, while introducing distinct algorithmic, privacy, and system-level challenges (Puppala et al., 2024, Mukherjee et al., 24 Oct 2025, Gargary et al., 2024).

1. Core Principles and Formal Problem Setting

GenFL augments the standard FL paradigm—which aggregates discriminative model updates—with generative mechanisms that either generate and share synthetic data or federate the generative model parameters themselves (Puppala et al., 2024, Gargary et al., 2024).

The GenFL formal objective generalizes the FL aggregation rule to either synthetic data or generative parameters: min{θk}k=1KwkLkgen(θk)s.t.θkθ,k,\min_{\{\theta_k\}} \sum_{k=1}^K w_k \mathcal{L}_k^{\mathrm{gen}}(\theta_k) \quad \text{s.t.} \quad \theta_k \approx \theta, \forall k, with wkw_k a data- or task-dependent client weight.

Distinct GenFL workflows include: (i) sharing synthetic samples instead of, or alongside, model updates (Puppala et al., 2024), (ii) aggregating local generative model updates (Mukherjee et al., 24 Oct 2025), and (iii) leveraging server-side generative modules to actively generate synthetic data in response to global data deficiencies (Qiang et al., 26 Mar 2025).

2. System Architectures and Federated Protocols

GenFL architectures are categorized by the granularity, modality, and flow of generative information:

  • Client-Side Model Training and Synthetic Output Exchange: Each client trains a generative model (e.g., GAN, VAE); synthetic samples or generator parameters are transmitted either to a central server for aggregation or directly to peers in decentralized configurations (Pérez et al., 23 Jul 2025, Puppala et al., 2024).
  • Server-Side Generative Augmentation: The server maintains a global generative model or pool, synthesizing examples to address label imbalance and data scarcity, and then integrating these into the downstream FL task (Qiang et al., 26 Mar 2025, Zhang et al., 2023).
  • Selective/Partial Model Sharing: To reduce communication and privacy costs, methods such as PS-FedGAN only transmit partial model components (e.g., discriminators/seeds, not full generators), updating private or “shadow” server-side generators (Wijesinghe et al., 2023).
  • Model Heterogeneity and Personalization: In heterogenous settings (distinct client architectures), frameworks like GeFL enable model-agnostic learning via a shared generative model, facilitating cross-client knowledge transfer through federated synthetic data, supporting clients with incompatible architectures (Kang et al., 2024).
  • Diffusion-Based Parameter Aggregation: Approaches such as pFedGPA use diffusion models to aggregate high-dimensional client parameters on a nonlinear manifold, offering improved adaptation and client-specific generation of personalized models (Lai et al., 2024).
  • Blockchain Protocols: For tamper-proof auditability and incentive management, blockchain-based protocols integrate validation, consensus, and rewards for generative model contributions, often via smart contracts (Puppala et al., 2024).

3. Algorithmic and Mathematical Foundations

GenFL instantiates several algorithmic recipes, frequently unifying adversarial or probabilistic generative training with federated optimization:

  • Federated GAN Training: Clients locally solve:

minGkmaxDkExPk[logDk(x)]+Ezpz[log(1Dk(Gk(z)))]\min_{G_k} \max_{D_k} \mathbb{E}_{x \sim P_k}[\log D_k(x)] + \mathbb{E}_{z \sim p_z} [\log(1 - D_k(G_k(z)))]

and transmit updates for aggregation, e.g., FedAvg on generator/discriminator weights (Puppala et al., 2024, Mukherjee et al., 24 Oct 2025).

  • Federated VAE Training: With encoder qϕk(zx)q_{\phi_k}(z|x) and decoder pθk(xz)p_{\theta_k}(x|z), clients update parameters to maximize:

LELBO(x)=Eqϕk(zx)[logpθk(xz)]KL(qϕk(zx)p(z))\mathcal{L}_{\mathrm{ELBO}}(x) = \mathbb{E}_{q_{\phi_k}(z|x)}[\log p_{\theta_k}(x|z)] - \mathrm{KL}(q_{\phi_k}(z|x) || p(z))

and perform FedAvg or other aggregation (Gulati et al., 15 Dec 2025, Puppala et al., 2024).

  • Diffusion Models: Either on data or parameter space, as in pFedGPA, server-side diffusion learns to integrate client parameter distributions:

Lddpm=Et,z0,ϵϵϵϕ(zt,t)22L_{\mathrm{ddpm}} = \mathbb{E}_{t, z_0, \epsilon} \left\| \epsilon - \epsilon_\phi(z_t, t) \right\|_2^2

This generative framework decouples local versus global complexity for personalized FL (Lai et al., 2024).

  • Data Augmentation in FL: Synthetic data DgenD^{\mathrm{gen}} are merged with private data, so local updates minimize:

Li(mix)(θ)=1Di+D~ireal + synthetic(fθ(x),y)\mathcal{L}_i^{\mathrm{(mix)}}(\theta) = \frac{1}{|D_i| + |\tilde{D}_i|} \sum_{\text{real + synthetic}} \ell(f_\theta(x), y)

showing empirical gains versus vanilla FL (Ye et al., 2023).

  • Adaptive Aggregation: FedCAR, for generative models, adaptively re-weights client updates via cross-client FID distances, to favor generator contributions that better align with the target distribution (Kim et al., 2024).

4. Privacy, Security, and Communication Constraints

GenFL architectures are often designed to maximize privacy and robustness:

5. Empirical Evaluation and Applications

GenFL frameworks have been evaluated across a variety of verticals and tasks, under both centralized and decentralized, synchronous and asynchronous regimes:

Context GenFL Role Main Empirical Findings Reference
Healthcare VAE-based imputation, personalized risk prediction Federated VAE learning and synthetic sample generation improves privacy and minority class coverage. (Puppala et al., 2024, Mukherjee et al., 24 Oct 2025)
IoT/Edge Edge GAN generation, decentralized FL Decentralized GenFL improves robustness, reduces latency, and outperforms classical cloud-centric systems by up to 12% accuracy, with –73% response time. (Mukherjee et al., 24 Oct 2025)
Heterogeneous FL Model-agnostic/foundation models Generative prompt-based or feature-level models support arbitrarily diverse architectures and mitigate privacy leakage. (Kang et al., 2024, Zhang et al., 2023)
Medical Imaging FID-adaptive GAN aggregation Cross-institutional StyleGAN2 training with FedCAR improves FID scores over centralized and standard FL, even in severe non-IID regimes. (Kim et al., 2024)
Persistent/Continual FL ACGAN replay + model consolidation Mitigates catastrophic forgetting, stabilizes generator quality in class-incremental multi-round non-IID streams. (Qi et al., 2023)

Synthetic data generated under GenFL is empirically validated to match or, under certain non-IID conditions, surpass centralized baselines in classification accuracy, FID, and resilience to membership inference/model inversion attacks (Triastcyn et al., 2019, Zhang et al., 2023, Gargary et al., 2024, Kim et al., 2024, Ye et al., 2023).

6. Advanced Topics, Limitations, and Open Directions

7. Representative Evaluation Metrics

GenFL works report both standard discriminative and generative metrics, often including:


Generative Federated Learning synthesizes federated and generative paradigms to unlock privacy-preserving, robust, and data-efficient distributed modeling under diverse and heterogeneous client environments. Ongoing research continues to advance its theory, scalability, privacy analysis, and application breadth (Puppala et al., 2024, Gargary et al., 2024, Mukherjee et al., 24 Oct 2025, Kim et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generative Federated Learning (GenFL).