FedBiCross: OSFL for Non-IID Medical Data

Updated 12 January 2026

FedBiCross is a bi-level optimization framework that enables data-free one-shot federated learning by aggregating decentralized models via clustering and adaptive weight optimization.
It employs K-means clustering on client prediction matrices and deep inversion techniques to generate synthetic data, mitigating uniform soft label issues in non-IID settings.
The framework achieves significant performance gains on MedMNIST datasets and supports personalized fine-tuning for privacy-sensitive clinical applications.

FedBiCross is a bi-level optimization framework introduced for data-free one-shot federated learning (OSFL) under non-IID (non-identically independently distributed) settings, with a particular emphasis on privacy-sensitive medical imaging data. Unlike conventional federated learning solutions requiring multiple rounds of communication or direct access to raw data, FedBiCross executes knowledge aggregation in a single round through the exchange of models only. It addresses the major challenge in OSFL: the destructive effects of aggregating predictions from non-IID clients, which can result in near-uniform soft labels and inadequate supervision for distillation tasks (Xia et al., 5 Jan 2026).

1. OSFL under Non-IID Medical Data: Problem Formulation

In OSFL, each of $N$ clients %%%%1%%%% holds a private dataset $\mathcal{D}_i$ with substantial distribution skew. After clients upload their locally trained models $f_i$ , the server is restricted to a single communication and aims to produce a set of personalized models $\{f_i^{\mathrm{pers}}\}$ tailored to each client’s data distribution.

Key notations:

$f_i(\bm x) \in \Delta^{C-1}$ : Client $i$ ’s soft-prediction vector for $C$ classes.
$F(\bm x) = \frac{1}{N}\sum_{i=1}^N f_i(\bm x)$ : Uniform ensemble teacher.
Knowledge distillation loss on synthetic inputs $\{\hat{\bm x}\}$ :

$L_{\mathrm{KD}}(G, F, \{\hat{\bm x}\}) = \sum_{\hat{\bm x}} \mathrm{KL}\bigl(F(\hat{\bm x})\,\|\,G(\hat{\bm x})\bigr)$

Aggregation of $F(\hat{\bm x})$ under high non-IID can yield soft predictions nearly uniform, diminishing the signal for student model learning.

2. Client Clustering and Sub-Ensemble Construction

To counteract “teacher disagreement” from naïve averaging, FedBiCross applies a clustering stage based on output similarity of client models.

Output Similarity Measurement: Generate $M$ random noise inputs $\{\bm z_m\}$ ; for each client, construct prediction matrix $\bm P_i = \bigl[f_i(\bm z_1),\,\dots,\,f_i(\bm z_M)\bigr] \in \mathbb{R}^{C \times M}$ .
Clustering via $K$ -Means: Partition clients into $K$ clusters $\{\mathcal{C}_k\}$ by minimizing Frobenius norm distance

$\min_{\{\mathcal{C}_k\}, \{\bm c_k\}} \sum_{k=1}^K \sum_{i \in \mathcal{C}_k} \|\bm P_i - \bm c_k\|_F^2$

with centroid $\bm c_k$ .

Sub-Ensemble Teacher Construction: For cluster $k$ , define $F_k(\bm x) = \frac{1}{|\mathcal{C}_k|}\sum_{i \in \mathcal{C}_k} f_i(\bm x)$ .
Deep Inversion for Synthetic Data: Synthetic batches for each cluster are generated via iterative gradient optimization:

$\hat{\bm x}_k^{(t)} = \hat{\bm x}_k^{(t-1)} - \eta_s \nabla_{\hat{\bm x}} \mathcal{L}_{\mathrm{DI}}(\hat{\bm x}_k^{(t-1)}; F_k, y)$

where $\mathcal{L}_{\mathrm{DI}}$ contains cross-entropy, total variation, and batch norm regularization terms.

Noise-adapted teachers $\tilde F_k$ are constructed by updating batch-normalization statistics along the inverse synthesis trajectory.

3. Bi-Level Cross-Cluster Optimization

While clustering mitigates average disagreement, each cluster’s information remains limited. Mixing data naively from other clusters risks negative transfer. FedBiCross proposes a bi-level optimization approach to learn adaptive cluster weights.

Bi-Level Objective:

For cluster $k$ with learnable weights $\bm w_k = (w_{k,1},...,w_{k,K})$ , the aggregated objective is:

$\begin{aligned} \bm w_k^* &= \arg\min_{\bm w_k} \sum_{t=1}^T \mathcal{L}_{\mathrm{KD}^{(t)}}(G_k^*(\bm w_k), F_k, \tilde F_k, \hat{\bm x}_k^{(t, \mathrm{val})}) \ \text{s.t.} ~ G_k^*(\bm w_k) &= \arg\min_G \sum_{t=1}^T \sum_{j=1}^K w_{k,j} \mathcal{L}_{\mathrm{KD}^{(t)}}(G, F_j, \tilde F_j, \hat{\bm x}_j^{(t, \mathrm{train})}) \end{aligned}$

with

$\mathcal{L}_{\mathrm{KD}^{(t)}}(G, F, \tilde F, \hat{\bm x}) = \lambda^{(t)} \mathrm{KL}(\tilde F(\hat{\bm x})\,\|\,G(\hat{\bm x})) + (1 - \lambda^{(t)}) \mathrm{KL}(F(\hat{\bm x})\,\|\,G(\hat{\bm x}))$

and $\lambda^{(t)} = 1 - t/T$ .

Online Approximation: Iteratively update $G_k$ and $\bm w_k$ in alternating inner/outer steps per synthesis iteration, with simplex projection for $\bm w_k$ normalization.

This approach dynamically suppresses clusters that negatively impact knowledge transfer for a given cluster, enhancing the diversity and relevance of the cross-cluster guidance.

4. Personalized Distillation for Client Adaptation

Final student models $G_k^{(T)}$ for each cluster are fine-tuned to yield personalized models for each client using private data.

Initialization: $f_i^{\mathrm{pers}} \leftarrow G_k^{(T)}$ for $i \in \mathcal{C}_k$ .
Objective combines:
- Fitting local data via cross-entropy
- Regularization for cluster knowledge preservation
- Retention of original client bias

$\mathcal{L}_{\mathrm{pers}} = \mathcal{L}_{\mathrm{CE}}(f_i^{\mathrm{pers}}(\bm x), y) + \gamma\, \mathrm{KL}(G_k^{(T)}(\bm x) \| f_i^{\mathrm{pers}}(\bm x)) + \delta\, \mathrm{KL}(f_i(\bm x) \| f_i^{\mathrm{pers}}(\bm x))$

This stage enables client-specific adaptation while maintaining federated and cluster-level knowledge.

5. Algorithmic Workflow

The methodological pipeline of FedBiCross consists of three sequential stages as illustrated below.

Stage	Core Operation	Output
Clustering/Data Synthesis	$K$ -Means grouping, deep inversion-based synthetic data, teacher construction	Cluster data/teachers $\{F_k\}$
Bi-Level Optimization	Online weight adaptation, model updates per cluster	Models $\{G_k^{(T)}\}$ , weights
Personalization	Fine-tuning with local client data	Personalized models $\{f_i^{\mathrm{pers}}\}$

The process is further detailed in stepwise pseudocode specifying model construction, synthetic data generation, inner-outer updates, and final adaptation.

6. Experimental Evaluation and Ablation Analyses

FedBiCross is empirically validated on four MedMNIST v2 datasets: BloodMNIST, DermaMNIST, OCTMNIST, and TissueMNIST. Non-IID splits are constructed via Dirichlet $(\alpha)$ with $\alpha \in \{0.1, 0.2, 0.3, 0.5\}$ . Experiments use varying client and cluster counts matched to dataset/statistical complexity.

Compared baselines include FedAvg-1, DAFL, DENSE, FedISCA, and Co-Boosting. The principal metric is average test accuracy per client. Quantitative results report that FedBiCross consistently outperforms all baselines, achieving improvements ranging from 10 to 30 points, e.g., on BloodMNIST with $N=5$ , $\alpha=0.1$ , FedBiCross achieves 85.57% (vs. Co-Boosting’s 54.75%).

Ablation studies confirm that:

Intra-cluster-only schemes underperform by 5–10 points.
Uniform and similarity-based cross-cluster weighting lag behind full bi-level weighting.
Eliminating personalization drops performance by 7–15 points.
Disabling clustering reduces accuracy by 10–20 points.

Qualitatively, synthetic samples generated by FedBiCross display defined medical structures, whereas competing approaches show artifacts or mode collapse.

7. Significance and Implications

FedBiCross combines clustering by prediction similarity, bi-level online optimization for knowledge selection, and personalized fine-tuning into a unified framework, improving OSFL feasibility in privacy-constrained, non-IID clinical environments. The demonstrated robustness to distributional skew and marked performance gains relative to state-of-the-art methods suggest that bi-level adaptation and careful sub-ensemble formation are critical for one-shot federated settings (Xia et al., 5 Jan 2026). A plausible implication is the generalizability of clustering-plus-bi-level optimization in wider federated setups beyond medical imaging, wherever client predictions exhibit strong heterogeneity.

PDF Markdown Chat (Pro)

References (1)

FedBiCross: A Bi-Level Optimization Framework to Tackle Non-IID Challenges in Data-Free One-Shot Federated Learning on Medical Data (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to FedBiCross.