Papers
Topics
Authors
Recent
2000 character limit reached

CrossMoDA 2023 Challenge

Updated 8 January 2026
  • CrossMoDA 2023 is an international challenge focused on unsupervised segmentation of vestibular schwannoma and cochlea on MRI, leveraging extreme domain shifts between ceT1 and T2 sequences.
  • The competition employs a multi-institutional, heterogeneous dataset and innovative pipelines like Vandy365 that integrate image translation, style transfer, and robust nnU-Net segmentation.
  • Quantitative metrics such as Dice and ASSD demonstrate improved tumor segmentation while highlighting ongoing challenges in small-structure delineation, underscoring clinical impact.

The Cross-Modality Domain Adaptation (CrossMoDA) 2023 challenge is an international competition focused on unsupervised cross-modal segmentation of vestibular schwannoma (VS) and cochlea in magnetic resonance imaging (MRI). Organized in conjunction with the MICCAI conference series, CrossMoDA serves as a dynamic benchmark for testing domain adaptation algorithms, leveraging the extreme domain shift between contrast-enhanced T1-weighted (ceT1) and high-resolution T2-weighted (T2) sequences. Its primary motivation is to automate VS and cochlea segmentation on cost-effective routine T2 scans, improving clinical workflows and broadening access to precision otoneurosurgical planning. The 2023 edition marks a significant evolution: it incorporated multi-institutional, highly heterogeneous routine data and refined the segmentation task into intra- and extra-meatal tumour compartments, intentionally increasing the clinical and methodological challenge (Wijethilake et al., 13 Jun 2025).

1. Multi-Institutional, Heterogeneous Dataset Design

The 2023 CrossMoDA dataset comprises 959 MRI scans partitioned across three distinct cohorts and multiple institutions (see Table 1 of (Wijethilake et al., 13 Jun 2025)):

Institution Source ceT1 (Train) Target T2 (Train) Validation Test MRI Protocols
London SC-GK (UK) 79 85 32 113 MP-RAGE (ceT1), 3D-CISS/FIESTA (T2)
Tilburg SC-GK (Netherlands) 105 105 32 134 3D-FFE (ceT1), 3D-TSE (T2)
UK MC-RC (10 UK Sites) 38 ceT1/43 T2 47 T2 15 45 Varied protocols, 1.0–3.0 T, Siemens/Philips/GE/Hitachi

The MC-RC cohort reflects real-world scan diversity, with voxel sizes spanning 0.01–1 mm³, slice thickness between 1–5 mm, and a variety of magnet field strengths and vendors. Figure 1 in (Wijethilake et al., 13 Jun 2025) demonstrates pronounced inter-site variability in intensity histograms and structural representations, presenting a severe domain adaptation challenge.

2. Refined Segmentation Task Definition and Clinical Relevance

For 2023, the segmentation objective was set as a three-class problem on T2-weighted MRI without Koos grading:

  1. Intra-meatal VS (within the internal auditory canal)
  2. Extra-meatal VS (in the cerebellopontine angle)
  3. Bilateral cochlea

Sub-compartment delineation followed the posterior petrous wall per Kanzaki et al. The Koos grading classification, a binary task present in 2022, was omitted in 2023 to increase granularity within the tumour segmentation task, reflecting evolving clinical requirements for surgical planning and risk stratification (Wijethilake et al., 13 Jun 2025).

3. Winning Approach: Vandy365 Pipeline

The top-ranked 2023 solution (Vandy365) implemented a multi-stage unsupervised domain adaptation pipeline combining advanced image-to-image translation, style transfer, segmentation, and data augmentation techniques:

  • Image-to-image translation used an extended 3D QS-Attn architecture with adversarial loss and patch-wise contrastive InfoNCE loss:

Ladv=ExS[logD(G(x))]+EyT[log(1D(y))]L_{\mathrm{adv}} = \mathbb{E}_{x\sim S}[\log D(G(x))] + \mathbb{E}_{y\sim T}[\log(1 - D(y))]

  • Site-specific style transfer leveraged dynamic instance normalization codes to synthesize heterogeneous T2 MRI appearance.
  • Segmentation was performed using 3D full-resolution nnU-Net backbones (combining U-Net and ResU-Net variants), followed by self-training with iterative pseudo-labeling on unlabelled real T2 scans.
  • Data augmentation incorporated random style interpolation, structure-wise intensity scaling (±50%\pm 50\%), and oversampling of challenging cases.
  • Model ensembling fused outputs of 11 model instances, optimizing for robustness.
  • Loss functions blended standard Dice:

LDice=12ipk,igk,iipk,i+igk,iL_{\mathrm{Dice}} = 1 - \frac{2\sum_i p_{k,i}g_{k,i}}{\sum_i p_{k,i} + \sum_i g_{k,i}}

with cross-entropy and adversarial-consistency objectives in the translation phase.

Post-processing retained only the largest connected component for each class, mitigating spurious predictions (Wijethilake et al., 13 Jun 2025).

4. Quantitative Results and Comparative Analysis

Performance was evaluated using Dice Similarity Coefficient (DSC) and Average Symmetric Surface Distance (ASSD), aggregated across 341 test cases:

Region DSC Median [IQR] ASSD Median IQR
VS (all) 87.1% [83.3–90.2] 0.40 [0.32–0.51]
Intra-meatal VS 74.9% [68.8–80.0] 0.43 [0.32–0.57]
Extra-meatal VS 87.3% [82.7–90.4] 0.40 [0.32–0.52]
Cochlea 84.1% [81.7–86.0] 0.21 [0.13–0.25]

Outlier analysis (Figures 2–4, (Wijethilake et al., 13 Jun 2025)) revealed a significant reduction in catastrophic segmentation failures for VS, particularly extra-meatal, with persistent challenges for the cochlea attributed to anatomical smallness and faint image contrast.

Retrospective winner comparison indicated:

  • On the homogeneous London SC-GK dataset, 2023 VS DSC (88.6%) marginally surpassed the 2022 winner (88.5%), while cochlea DSC decreased slightly (86.3% vs. 87.1%).
  • Data heterogeneity played a central role: broader protocol diversity increased segmentation robustness for tumors but imposed penalties for the cochlea, demonstrating a trade-off between generalizability and small-structure sensitivity (Figure 2).

5. Representative Methodologies: “Out-of-the-Box” Translation and Segmentation

Notable submissions include an approach employing the CUT model for unpaired image-to-image translation and nnU-Net for segmentation (Choi, 2021):

LNCEl=Ex[i=1Nllog(exp(qiki/τ)j=1Nlexp(qikj/τ))]L_{\mathrm{NCE}}^l = \mathbb{E}_{x}\Bigg[ -\sum_{i=1}^{N_l} \log\Bigg( \frac{\exp(q_i \cdot k_i / \tau)}{\sum_{j=1}^{N_l} \exp(q_i \cdot k_j / \tau)} \Bigg) \Bigg]

  • Data Augmentation: “Augmented Tumor”—artificial reduction of tumor intensity by 50% in synthetic T2 slices to simulate real-world signal heterogeneity. Resulting dataset sizes increased from 105 to 210, improving resilience to intensity variation.
  • Segmentation: 3D full-resolution nnU-Net configured with default parameters (U-Net, 5 depth levels, instance normalization, deep supervision).
  • Ensembling: Averaging softmax outputs across five cross-validated models.
  • Results: Test set mean Dice 0.8253 (Tumor: 0.8288; Cochlea: 0.8217), third place in the leaderboard. FID score for generated images: 11.15 (CUT), outperforming FastCUT (32.85). The method demonstrated stable, plug-and-play segmentation with minimal custom architecture tuning (Choi, 2021).

Expanded data heterogeneity led to a decrease in outlier frequency and improved performance for VS segmentation, even on homogeneous test sets. This suggests that style variation and routine-surveillance representations counteract overfitting to high-resolution planning protocols. Conversely, segmentation of the cochlea, a small organ at risk, suffered under increased data diversity due to boundary ambiguity and limited network capacity in the three-class setting. The plateauing of VS Dice scores across sequential challenge editions indicates that dataset expansion is approaching a limiting regime for current algorithmic strategies; a plausible implication is that future benchmarks require even more severe cross-modal shifts (Wijethilake et al., 13 Jun 2025).

7. Future Directions and Recommendations

Authors recommend further increasing clinical challenge complexity, for example, by extending cross-modal benchmarks to include ultrasound or leveraging the ReMIND dataset for MRI–ultrasound adaptation. Small-structure segmentation (e.g., cochlea) may benefit from local refinement, advanced priors, or more expressive architectures. Multi-metric, rank-then-aggregate evaluations (DSC + ASSD) will remain essential for robust, clinically meaningful leaderboard stability. Reintroduction of classification or prediction tasks (such as VS growth) is encouraged once participation is sufficient (Wijethilake et al., 13 Jun 2025).

Challenges remain for clinical deployment, especially regarding small organ segmentation under high protocol variability. Continuous evolution of the CrossMoDA series reflects a research-driven push towards benchmark tasks that better represent the realities of modern neuroimaging and surgical planning.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to CrossMoDA 2023.