FedOAP: Organ-Agnostic Tumor Segmentation
- Organ agnostic tumor segmentation is an approach that unifies multi-organ data using federated learning and CLIP-based embeddings to delineate tumors without relying on organ-specific models.
- FedOAP integrates personalized federated optimization with decoupled cross-attention and global feature aggregation, ensuring robust performance while preserving data sovereignty.
- The framework employs a boundary-focused perturbation loss to enhance precision in tumor delineation, achieving superior Dice scores across diverse imaging modalities.
Organ agnostic tumor segmentation refers to learning frameworks and federated algorithms that achieve robust tumor detection and delineation across multiple organs, without reliance on organ-specific modeling or full co-annotation of training data. The Federated Organ-Agnostic Pipeline (“FedOAP”) is a leading approach leveraging personalized federated learning, shared global feature aggregation, and privacy-preserving mechanisms to achieve superior cross-organ tumor segmentation performance while maintaining data sovereignty across disparate sites (Tashdeed et al., 24 Nov 2025). Integration with universal, CLIP-driven segmentation models further enhances scalability, semantic flexibility, and adaptation to novel tumor or organ classes (Liu et al., 2023).
1. Foundations of Organ-Agnostic Tumor Segmentation
Conventional segmentation methodologies in medical imaging predominantly target single-organ or site-specific tumor delineation, often hampered by data fragmentation, partial labels, and annotation inconsistencies across datasets. Organ-agnostic paradigms address these limitations by learning models that operate over unified multi-organ label spaces, propagate anatomical knowledge across tasks, and generalize segmentation capabilities to arbitrary organ/tumor types. The development of these frameworks is closely tied to advances in large-scale dataset assembly, semantic label embedding (notably via CLIP), and federated learning protocols enabling multi-institutional collaboration without direct data sharing (Liu et al., 2023, Tashdeed et al., 24 Nov 2025).
2. CLIP-Driven Universal Models for Organ-Agnostic Segmentation
The CLIP-driven universal model introduces a text-driven embedding pipeline for both organ and tumor segmentation. It leverages a pre-trained CLIP text encoder (e.g., ViT-B/32) to derive fixed embeddings for medical prompts corresponding to each of classes (25 organs, 6 types of tumors). These prompt-based embeddings serve as a semantic scaffold, capturing anatomical relationships (e.g., cosine similarity ) and unifying the taxonomy across 14 public CT datasets (total scans, 31 classes after resolving overlaps and label splits).
A 3D encoder (e.g., Swin UNETR) processes the volumetric scan to extract global feature maps, which are concatenated with CLIP text embeddings—forming joint prompt vectors . These are used to parameterize a text-driven segmentor that generates class-specific probability maps. Training employs masked loss computation over only the locally-annotated classes (partial-label masking), optimizing a combination of cross-entropy and Dice objectives:
where is the binary target mask.
This design enables seamless extension to new classes by simply providing their textual prompt; retraining is unnecessary, provided a CLIP embedding is computable (Liu et al., 2023).
3. The FedOAP Architecture and Federated Optimization
FedOAP targets personalized federated segmentation under non-IID conditions, allowing clients (e.g., hospitals) with organ- or tumor-specific data to collaboratively learn an organ-agnostic tumor segmentor (Tashdeed et al., 24 Nov 2025). Its architecture is based on a multi-branch U-Net:
- Shared encoder–decoder (global): Consists of four down-sampling stages, a bottleneck split into “query” and “key-value” arms (dimension ), and a mirrored up-sampling decoder. All key-value and decoder parameters (denoted ) are synchronized via federated aggregation.
- Personalized client heads: Each site maintains private query projections and a spatial adapter in the decoder (two 3×3 conv layers with residual connection). Local updates are never transmitted, preserving local specificity and confidentiality.
- Decoupled cross-attention (DCA): At each global communication round, clients construct query features privately, while keys () and values () are aggregated across clients into and :
This mechanism enables each client to attend globally to high-level inter-organ (inter-client) features while retaining site-specific queries, facilitating both information transfer and privacy.
- Training protocol (overview):
- Alternating rounds of local update (for ) and federated averaging (for ).
- Final models merge the latest and locally optimized .
- Fine-tuning involves a boundary-focused Perturbed Boundary Loss (PBL).
4. Boundary-Focused Personalization: Perturbed Boundary Loss (PBL)
To enhance segmentation precision—particularly for complex or fuzzy tumor boundaries—FedOAP introduces PBL during local fine-tuning (Tashdeed et al., 24 Nov 2025). For each sample and intermediate prediction :
- Inconsistency mask: if (default ), $0$ otherwise.
- Perturbed logits: , with , .
- Composite loss: Clean loss (BCE + Dice); perturbed loss on perturbed outputs.
- Final fine-tuning objective: ().
This dual supervision sharpens boundary localization, focusing learning on uncertain or misaligned mask regions.
5. Quantitative Performance and Comparative Results
FedOAP demonstrates state-of-the-art segmentation accuracy across diverse organ and modality combinations. On datasets with three clients (Breast DCE-MRI, Brain MRI, Liver CT):
| Method | BreastDM Dice (%) | BraTS Dice (%) | LiTS Dice (%) | Avg (%) |
|---|---|---|---|---|
| FedAvg | 50.91±3.62 | 1.28±1.17 | 9.71±0.95 | 20.63 |
| FedPer | 59.91±23.47 | 19.15±23.28 | 42.06±24.12 | 40.37 |
| FedRep | 85.92±1.99 | 75.56±19.09 | 49.75±2.94 | 70.41 |
| FedDP | 91.47±0.72 | 94.04±0.56 | 83.88±3.27 | 89.80 |
| FedOAP | 94.39±0.62 | 95.31±0.41 | 87.22±0.57 | 92.31 |
Addition of DCA, spatial adapter, and PBL yields stepwise improvements (up to +69.4 points over baseline on BraTS), with FedOAP consistently outperforming both naive and parameterized personalization strategies.
Zero-shot adaptation and rapid fine-tuning generalize to unseen organ data. On lung CT, FedOAP achieves 7.1% zero-shot Dice (vs 1.1% for FedDP), and 72.3% Dice after short fine-tuning (vs 65.4%) (Tashdeed et al., 24 Nov 2025).
Integration with CLIP-driven universal models enables further advancements. On the Medical Segmentation Decathlon, the universal model achieves higher Dice on liver tumor (+1.53 pp), pancreas tumor (+5.38 pp), and overall tumor tasks (+~4.1 pp) compared to nnUNet. On the BTCV benchmark, average organ Dice improves by +4.07 pp (Liu et al., 2023).
6. Architectural, Computational, and Federated Properties
- Efficiency: FLOPs per scan for universal architectures are 19× lower than nnUNet and 6× lower than Swin UNETR at input, streamlining on-site deployment for clinical use (Liu et al., 2023).
- Communication footprint: FedOAP transmits approximately 130 MB per client per round, comparable to traditional federated approaches.
- Scalability: Algorithmic FLOPs scale linearly in the number of clients. As few as 5 communication rounds suffice for convergence in cross-organ settings.
- Partial label accommodation: Training protocols mask gradient computations to only available class annotations, allowing clients to contribute without full multi-organ/tumor labels.
- Semantic extendability: Addition of new anatomical or tumor categories entails only prompt embedding extension (in CLIP-driven models), with no retraining required, provided text-branch weights are fixed.
- Privacy: Query projections and local adapters are private to each client, and cross-client feature exchange is restricted to key–value aggregations, reducing potential for information leakage.
7. Guidelines and Future Directions for Organ-Agnostic and Federated Segmentation
Practical insights derived from empirical and ablation studies include:
- Organ-agnostic feature sharing via decoupled cross-attention accelerates adaptation, even for previously unseen organs, harnessing priors from heterogeneous tasks.
- Brief local fine-tuning (e.g., two epochs) using boundary-focused perturbation suffices to personalize models, yielding substantial Dice improvements.
- Optimal hyperparameter regimes include boundary threshold in [0.7, 0.8] and noise variance for PBL, balancing targeted supervision and training stability.
- Implementation on standard U-Net backbones is straightforward, requiring minimal plug-in of DCA, spatial adapters, and PBL components.
- A plausible implication is that universal CLIP-driven backbones combined with federated DCA personalization (as in FedOAP) will remain a dominant framework for robust tumor detection as federated hubs and prospective datasets proliferate.
Organ-agnostic federated segmentation thus provides both a computationally efficient and semantically flexible solution to the challenges of large-scale, privacy-preserving, and continuously evolving multi-organ tumor detection in medical imaging (Liu et al., 2023, Tashdeed et al., 24 Nov 2025).