FedBiCross: OSFL for Non-IID Medical Data
- FedBiCross is a bi-level optimization framework that enables data-free one-shot federated learning by aggregating decentralized models via clustering and adaptive weight optimization.
- It employs K-means clustering on client prediction matrices and deep inversion techniques to generate synthetic data, mitigating uniform soft label issues in non-IID settings.
- The framework achieves significant performance gains on MedMNIST datasets and supports personalized fine-tuning for privacy-sensitive clinical applications.
FedBiCross is a bi-level optimization framework introduced for data-free one-shot federated learning (OSFL) under non-IID (non-identically independently distributed) settings, with a particular emphasis on privacy-sensitive medical imaging data. Unlike conventional federated learning solutions requiring multiple rounds of communication or direct access to raw data, FedBiCross executes knowledge aggregation in a single round through the exchange of models only. It addresses the major challenge in OSFL: the destructive effects of aggregating predictions from non-IID clients, which can result in near-uniform soft labels and inadequate supervision for distillation tasks (Xia et al., 5 Jan 2026).
1. OSFL under Non-IID Medical Data: Problem Formulation
In OSFL, each of clients %%%%1%%%% holds a private dataset with substantial distribution skew. After clients upload their locally trained models , the server is restricted to a single communication and aims to produce a set of personalized models tailored to each client’s data distribution.
Key notations:
- : Client ’s soft-prediction vector for classes.
- : Uniform ensemble teacher.
- Knowledge distillation loss on synthetic inputs :
Aggregation of under high non-IID can yield soft predictions nearly uniform, diminishing the signal for student model learning.
2. Client Clustering and Sub-Ensemble Construction
To counteract “teacher disagreement” from naïve averaging, FedBiCross applies a clustering stage based on output similarity of client models.
- Output Similarity Measurement: Generate random noise inputs ; for each client, construct prediction matrix .
- Clustering via -Means: Partition clients into clusters by minimizing Frobenius norm distance
with centroid .
- Sub-Ensemble Teacher Construction: For cluster , define .
- Deep Inversion for Synthetic Data: Synthetic batches for each cluster are generated via iterative gradient optimization:
where contains cross-entropy, total variation, and batch norm regularization terms.
Noise-adapted teachers are constructed by updating batch-normalization statistics along the inverse synthesis trajectory.
3. Bi-Level Cross-Cluster Optimization
While clustering mitigates average disagreement, each cluster’s information remains limited. Mixing data naively from other clusters risks negative transfer. FedBiCross proposes a bi-level optimization approach to learn adaptive cluster weights.
- Bi-Level Objective:
For cluster with learnable weights , the aggregated objective is:
with
and .
- Online Approximation: Iteratively update and in alternating inner/outer steps per synthesis iteration, with simplex projection for normalization.
This approach dynamically suppresses clusters that negatively impact knowledge transfer for a given cluster, enhancing the diversity and relevance of the cross-cluster guidance.
4. Personalized Distillation for Client Adaptation
Final student models for each cluster are fine-tuned to yield personalized models for each client using private data.
- Initialization: for .
- Objective combines:
- Fitting local data via cross-entropy
- Regularization for cluster knowledge preservation
- Retention of original client bias
This stage enables client-specific adaptation while maintaining federated and cluster-level knowledge.
5. Algorithmic Workflow
The methodological pipeline of FedBiCross consists of three sequential stages as illustrated below.
| Stage | Core Operation | Output |
|---|---|---|
| Clustering/Data Synthesis | -Means grouping, deep inversion-based synthetic data, teacher construction | Cluster data/teachers |
| Bi-Level Optimization | Online weight adaptation, model updates per cluster | Models , weights |
| Personalization | Fine-tuning with local client data | Personalized models |
The process is further detailed in stepwise pseudocode specifying model construction, synthetic data generation, inner-outer updates, and final adaptation.
6. Experimental Evaluation and Ablation Analyses
FedBiCross is empirically validated on four MedMNIST v2 datasets: BloodMNIST, DermaMNIST, OCTMNIST, and TissueMNIST. Non-IID splits are constructed via Dirichlet with . Experiments use varying client and cluster counts matched to dataset/statistical complexity.
Compared baselines include FedAvg-1, DAFL, DENSE, FedISCA, and Co-Boosting. The principal metric is average test accuracy per client. Quantitative results report that FedBiCross consistently outperforms all baselines, achieving improvements ranging from 10 to 30 points, e.g., on BloodMNIST with , , FedBiCross achieves 85.57% (vs. Co-Boosting’s 54.75%).
Ablation studies confirm that:
- Intra-cluster-only schemes underperform by 5–10 points.
- Uniform and similarity-based cross-cluster weighting lag behind full bi-level weighting.
- Eliminating personalization drops performance by 7–15 points.
- Disabling clustering reduces accuracy by 10–20 points.
Qualitatively, synthetic samples generated by FedBiCross display defined medical structures, whereas competing approaches show artifacts or mode collapse.
7. Significance and Implications
FedBiCross combines clustering by prediction similarity, bi-level online optimization for knowledge selection, and personalized fine-tuning into a unified framework, improving OSFL feasibility in privacy-constrained, non-IID clinical environments. The demonstrated robustness to distributional skew and marked performance gains relative to state-of-the-art methods suggest that bi-level adaptation and careful sub-ensemble formation are critical for one-shot federated settings (Xia et al., 5 Jan 2026). A plausible implication is the generalizability of clustering-plus-bi-level optimization in wider federated setups beyond medical imaging, wherever client predictions exhibit strong heterogeneity.