Papers
Topics
Authors
Recent
Search
2000 character limit reached

CLMU-Net: Modality-Agnostic 3D Brain Segmentation

Updated 27 January 2026
  • CLMU-Net is a continual learning architecture that adapts to sequential, heterogeneous MRI modalities without predefined channels, effectively mitigating catastrophic forgetting.
  • It integrates domain-conditioned textual guidance by fusing frozen BioBERT embeddings with 3D U-Net bottleneck features, thereby enhancing segmentation with global context.
  • A dual-criterion, lesion-aware experience replay buffer strategically selects both prototypical and challenging cases, yielding significant improvements in Dice scores.

CLMU-Net is a continual learning (CL) framework for 3D brain lesion segmentation designed to operate modality-agnostically across sequentially arriving, heterogeneous multi-modal MRI datasets. Built on a 3D U-Net backbone, CLMU-Net integrates three primary innovations: modality-flexible channel inflation, domain-conditioned textual guidance, and a compact, lesion-aware experience replay buffer. This architecture enables a single model to adapt to arbitrary and variable MRI modality inputs, injects explicit cohort-level global priors via frozen BioBERT textual embeddings at the bottleneck, and strategically rehearses both prototypical and challenging samples to mitigate catastrophic forgetting. Experiments with diverse brain lesion datasets demonstrate that CLMU-Net significantly outperforms conventional CL baselines, especially under stringent memory budgets and heterogeneous modality conditions (Sadegheih et al., 20 Jan 2026).

1. Network Structure and Data Flow

CLMU-Net employs a dynamically adaptable 3D U-Net architecture that ingests patches of size 1283128^3 voxels, with an input channel dimension matching the maximum number of modalities (Kmax(t)K_{\max}(t)) encountered during training up to episode tt. The input layer receives zero-filled channels for unavailable modalities, allowing seamless on-the-fly expansion. The encoder consists of four Conv3D-ReLU blocks with spatial stride 2, doubling feature depth from 32 to 256. At the bottleneck, the latent feature tensor FR256×16×16×16F \in \mathbb{R}^{256 \times 16 \times 16 \times 16} is reshaped for cross-attention with domain-conditioned textual embeddings. The decoder inverts this hierarchy with ConvTranspose3D blocks and skip connections, restoring full spatial resolution and outputting per-voxel lesion logits. Random Modality Drop (RMD) is applied during training by masking random subsets of available modalities to enforce robustness to missing sequences.

2. Modality-Flexible Channel Inflation

A central design principle is input layer inflation (termed ILI), permitting the network to accommodate any growing set of modalities without prior knowledge of the maximum. At each episode tt, the input channel set is expanded if new modalities are observed: pre-trained weights are copied for existing channels, and newly required channels are zero-initialized. An arbitrary subset Mi{1,,Kmax(t)}M_i \subseteq \{1, \dots, K_{\max}(t)\} is mapped as:

X[m,:,:,:]={Im,if mMi 0,otherwiseX[m,:,:,:] = \begin{cases} I_m, & \text{if } m \in M_i \ 0, & \text{otherwise} \end{cases}

This approach ensures the model can ingest and fuse varying modality subsets in a unified tensor, directly addressing the limitations of fixed-modality or maximum-set methods (Sadegheih et al., 20 Jan 2026).

3. Domain-Conditioned Textual Guidance

To inject cohort and case-level context at the feature bottleneck, CLMU-Net incorporates domain-conditioned textual guidance (DCTG). For each patch, a prompt string encodes the lesion type and available modalities (e.g., "Lesion=[tumor], Modalities=[T1, FLAIR]"). This prompt is embedded using frozen BioBERT to obtain token representations TRNt×768T \in \mathbb{R}^{N_t \times 768}, which are linearly projected to RNt×256\mathbb{R}^{N_t \times 256}. The bottleneck feature map is reshaped into a token sequence and fused with textual features via multi-head cross-attention, integrating global domain priors with local image features. The fused representation is reprojected and reshaped for the decoder, introducing an explicit mechanism for leveraging high-level semantic information regarding disease context and imaging protocol.

4. Lesion-Aware Experience Replay Buffer

Catastrophic forgetting is mitigated via a compact replay buffer maintained per episode and globally. After each dataset DtD_t, samples are ranked using complementary criteria:

  • Representative (R_rep): Combines normalized lesion size and mean lesion-region confidence, favoring large, confidently segmented lesions.
  • Difficult (R_diff): Blends normalized boundary uncertainty and morphological complexity (number of lesion components per voxel), emphasizing challenging cases.

For each episode, the top β/2\beta/2 samples per criterion are selected for the buffer. The global buffer is updated by merging current selections and evicting lowest-ranked samples globally to maintain a fixed total of β\beta samples. This jointly balanced selection yields up to +2+25%5\% mean Dice improvement over single-criterion selection [Table 3, (Sadegheih et al., 20 Jan 2026)].

5. Training Pipeline and Loss Formulation

The training process sequentially observes datasets with varying modalities and pathologies. Each episode includes:

  • Input layer inflation if new modalities arise.
  • 300 epochs over random 1283128^3 voxel patches (batch size 2).
  • Adam optimizer (lr=1e3\mathrm{lr}=1\mathrm{e}{-3}).
  • For each mini-batch: apply RMD, sample a replay buffer batch if available, and compute a composite loss:

Lseg(X)=λceLCE(y^,y)+λdice(1Dice(y^,y))L_{\mathrm{seg}}(X) = \lambda_{\mathrm{ce}} L_{\mathrm{CE}}(\hat{y}, y) + \lambda_{\mathrm{dice}}(1-\mathrm{Dice}(\hat{y}, y))

with total loss per iteration

Ltotal=Lseg(X)+μLreplayL_{\mathrm{total}} = L_{\mathrm{seg}}(X) + \mu L_{\mathrm{replay}}

(μ=1\mu=1). Buffer selection and update is performed at every episode’s end.

6. Empirical Evaluation and Results

CLMU-Net was evaluated on five heterogeneous 3D brain MRI cohorts: BRATS-Decathlon (tumors: T1, T1c, T2, FLAIR), ISLES (stroke: DWI), MSSEG (multiple sclerosis: T1, FLAIR, PD, T2), ATLAS (stroke lesions: T1), and WMH (white-matter hyperintensities: FLAIR). Experiments used two sequential dataset orders and standard intensity normalization and spatial resampling. Primary evaluation metrics were per-task Dice, AVG (final average Dice), ILM (mean Dice across all tasks), and BWT (backward transfer, with negative values indicating forgetting).

  • CLMU-Net + ILI + DCTG achieves mean AVG=59.22%, ILM=66.03%, BWT=–9.23 (for buffer size β=10\beta=10).
  • Best rehearsal baseline (ER): AVG≈49.84, ILM≈58.61, BWT≈–21.15.
  • Buffer-free methods: AVG≈43.62%, ILM≈53.78%.
  • Relative improvements over ER: +21.5% AVG (β=10), +14.3% (β=20).
  • Dynamic ILI provides +3–7% AVG and ~40–50% BWT reduction over fixed-channel architectures; dual-criterion buffer yields ΔAVG ≈+2–5% (Sadegheih et al., 20 Jan 2026).

These results demonstrate robust performance gains and marked reductions in forgetting, especially under heterogeneous-modality conditions and memory constraints.

7. Contributions, Strengths, and Limitations

CLMU-Net establishes true modality-agnostic continual learning in medical segmentation by eliminating the need to predefine maximum modality sets and enabling seamless architectural expansion. Domain-conditioned textual guidance provides efficient integration of global semantic priors, and the targeted replay buffer, by jointly balancing prototypical and difficult cases, sharply curtails forgetting even under small memory budgets.

Notable limitations include reliance on fixed prototypical/difficult ratios, which may be sub-optimal for some tasks, and potential privacy concerns due to raw 3D sample storage. Extensions to generative or privacy-preserving replay, and generalization to new pathologies or non-MRI modalities, are suggested directions. The approach demonstrates that dynamic adaptation, explicit domain knowledge injection, and principled sample selection are critical for robust continual segmentation in clinical imaging (Sadegheih et al., 20 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CLMU-Net.