mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation (2206.02425v2)

Published 6 Jun 2022 in eess.IV and cs.CV

Abstract: Accurate brain tumor segmentation from Magnetic Resonance Imaging (MRI) is desirable to joint learning of multimodal images. However, in clinical practice, it is not always possible to acquire a complete set of MRIs, and the problem of missing modalities causes severe performance degradation in existing multimodal segmentation methods. In this work, we present the first attempt to exploit the Transformer for multimodal brain tumor segmentation that is robust to any combinatorial subset of available modalities. Concretely, we propose a novel multimodal Medical Transformer (mmFormer) for incomplete multimodal learning with three main components: the hybrid modality-specific encoders that bridge a convolutional encoder and an intra-modal Transformer for both local and global context modeling within each modality; an inter-modal Transformer to build and align the long-range correlations across modalities for modality-invariant features with global semantics corresponding to tumor region; a decoder that performs a progressive up-sampling and fusion with the modality-invariant features to generate robust segmentation. Besides, auxiliary regularizers are introduced in both encoder and decoder to further enhance the model's robustness to incomplete modalities. We conduct extensive experiments on the public BraTS $2018$ dataset for brain tumor segmentation. The results demonstrate that the proposed mmFormer outperforms the state-of-the-art methods for incomplete multimodal brain tumor segmentation on almost all subsets of incomplete modalities, especially by an average 19.07% improvement of Dice on tumor segmentation with only one available modality. The code is available at https://github.com/YaoZhang93/mmFormer.

PDF Abstract

Multimodal Medical Transformer for Brain Tumor Segmentation

The paper introduces a novel approach to the challenging problem of brain tumor segmentation from Magnetic Resonance Imaging (MRI), particularly focusing on scenarios where not all typical MRI modalities are available. The proposed model, mmFormer, employs a Transformer-based architecture, highlighting its robustness in handling incomplete multimodal data by efficiently integrating both convolutional and Transformer networks.

The central innovation of mmFormer is its ability to process any subset of available MRI modalities for accurate brain tumor segmentation. This is a significant advancement given the commonality of missing data in clinical settings due to factors such as variable scanning protocols and patient conditions. The architecture of mmFormer is composed of several key components: hybrid modality-specific encoders, an inter-modal Transformer, and a convolutional decoder, complemented by auxiliary regularizers to enhance robustness.

Key Architectural Components

Hybrid Modality-Specific Encoders: Each encoder incorporates both convolutional layers and an intra-modal Transformer. This design captures local and global contexts within each MRI modality. The convolutional layers focus on local features, while the Transformer component models the long-range dependencies within modalities.
Inter-Modal Transformer: This module is pivotal for building correlations between different modalities. By aggregating features from each modality-specific encoder, the inter-modal Transformer generates modality-invariant features, ensuring robust performance even with incomplete data.
Convolutional Decoder: It reconstructs the spatial resolution of the segmentation masks through a process of progressive upsampling and feature fusion, leading to accurate delineation of tumor regions.
Auxiliary Regularizers: These are introduced in both the encoder and decoder stages to foster model robustness, encouraging discriminative feature learning even with missing modalities.

Experimental Validation

The model is rigorously evaluated on the BraTS 2018 dataset, a benchmark for multimodal brain tumor segmentation. mmFormer demonstrates superior performance over existing methods such as HeMIS and U-HVED in dealing with incomplete modalities. Notably, mmFormer shows an average improvement of 19.07% in Dice similarity coefficient (DSC) on enhancing tumor segmentation with only one available modality, underscoring its efficacy. Furthermore, mmFormer's performance closely approaches more computationally intensive methods like ACN, despite its significantly reduced training complexity.

Implications and Future Directions

The introduction of mmFormer sets a precedent for future research in medical image segmentation, particularly under resource-constrained conditions. By leveraging the strengths of Transformers in capturing long-range dependencies and the efficacy of convolutional networks in feature extraction, mmFormer presents a balanced and efficient solution to incomplete multimodal learning.

Future research may focus on extending this framework beyond brain imaging, exploring its applicability to other clinical imaging domains with multimodal data. Additionally, integrating advanced techniques in feature disentanglement and domain adaptation could further enhance the generalizability and robustness of such models.

In conclusion, the mmFormer stands as a robust architecture for multimodal brain tumor segmentation, offering a practical solution to the pervasive issue of incomplete modal data in clinical practice. Its integration of Transformer and convolutional networks exemplifies a strategic approach in handling complex medical imaging tasks, paving the way for further innovations in the field.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Yao Zhang (537 papers)
Nanjun He (6 papers)
Jiawei Yang (75 papers)
Yuexiang Li (50 papers)
Dong Wei (70 papers)
Yawen Huang (40 papers)
Yang Zhang (1129 papers)
Zhiqiang He (37 papers)
Yefeng Zheng (197 papers)

Citations (87)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - YaoZhang93/mmFormer: [MICCAI 2022] The official code for "mmFormer: Multimodal Medical Transformer for Incomplete Multimodal Learning of Brain Tumor Segmentation" (101 stars)