TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (2103.04430v2)

Published 7 Mar 2021 in cs.CV and cs.AI

Abstract: Transformer, which can benefit from global (long-range) information modeling using self-attention mechanisms, has been successful in natural language processing and 2D image classification recently. However, both local and global features are crucial for dense prediction tasks, especially for 3D medical image segmentation. In this paper, we for the first time exploit Transformer in 3D CNN for MRI Brain Tumor Segmentation and propose a novel network named TransBTS based on the encoder-decoder structure. To capture the local 3D context information, the encoder first utilizes 3D CNN to extract the volumetric spatial feature maps. Meanwhile, the feature maps are reformed elaborately for tokens that are fed into Transformer for global feature modeling. The decoder leverages the features embedded by Transformer and performs progressive upsampling to predict the detailed segmentation map. Extensive experimental results on both BraTS 2019 and 2020 datasets show that TransBTS achieves comparable or higher results than previous state-of-the-art 3D methods for brain tumor segmentation on 3D MRI scans. The source code is available at https://github.com/Wenxuan-1119/TransBTS

PDF Abstract

TransBTS: Advancements in Multimodal Brain Tumor Segmentation Using Transformers

The paper "TransBTS: Multimodal Brain Tumor Segmentation Using Transformer" introduces a novel approach for brain tumor segmentation in 3D MRI scans by leveraging the capabilities of Transformers integrated into a 3D Convolutional Neural Network (CNN) framework. The primary objective is to enhance the segmentation accuracy by effectively capturing both local and global contextual information through the hybrid utilization of 3D CNN and Transformer architectures.

Methodological Overview

TransBTS employs a transformer-enhanced 3D CNN within an encoder-decoder architecture designed explicitly for brain tumor segmentation on 3D MRI data. The encoder component of the model begins by extracting volumetric spatial features using 3D CNNs, which is essential for modeling local 3D context. The obtained feature maps are then processed into tokens, which are subsequently fed into a transformer to model global feature dependencies using self-attention mechanisms. The decoder, conversely, utilizes the transformer's output features to recursively upscale the spatial resolution and generate high-detail segmentation maps.

A significant focus of the architectural design lies in balancing the data representation capabilities offered by CNNs and transformers. TransBTS, therefore, ensures comprehensive local feature encoding implicitly provided by convolutions while exploiting the long-range dependencies feasible through attention mechanisms rooted in transformers.

Experimental Evidence and Evaluation

The performance of TransBTS is validated on the BraTS 2019 and 2020 datasets, which are standard benchmarks for brain tumor segmentation. TransBTS achieved competitive or superior results compared to existing state-of-the-art methods, getting Dice scores of up to 78.93% for enhancing tumor regions, 90.00% for whole tumor regions, and 81.94% for tumor core regions on BraTS 2019 validation set when test-time augmentation (TTA) is applied. Such results are indicative of the framework's robust feature modeling capability.

Additionally, the paper explores a comprehensive ablation paper which investigates the impact of architectural considerations such as the sequence length of tokens and varying the scale parameters (depth and embedding dimension) of the transformer component. This depth of experimentation reveals that an optimal balance between the architecture's complexity and performance can be achieved by tuning these parameters.

Model Complexity and Practicality

TransBTS strikes a balance between performance and computational efficiency, logging 32.99M parameters and 333G FLOPs which aligns with the size of a moderate model. The paper further suggests a simplified adaptation of TransBTS with about 54.11% fewer parameters and 37.54% less FLOPs, which maintains a comparable performance level.

Implications and Future Directions

The introduction of TransBTS marks a pivotal moment in brain tumor segmentation, expanding the frontier of transformer applications beyond their traditional domains and into 3D volumetric medical image analysis. The successful integration and demonstration of transformers with 3D CNNs open doors to further exploration and refinement of such hybrid models.

A plausible future direction involves enhancing the efficiency of attention mechanisms within transformers to advance computational and memory efficiency. Such improvements could lead to more scalable models, capable of handling the growing complexities in real-world medical imaging datasets.

In conclusion, TransBTS sets a formidable baseline for future endeavors in 3D medical imaging. By adopting a flexible network structure that marries 3D local feature extraction with global context modeling via transformers, this approach broadens the scope of segmentation capabilities and elucidates the potential applications of attention mechanisms in medical diagnostics.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Wenxuan Wang (128 papers)
Chen Chen (753 papers)
Meng Ding (27 papers)
Jiangyun Li (14 papers)
Hong Yu (114 papers)
Sen Zha (5 papers)

Citations (619)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - Rubics-Xuan/TransBTS: This repo provides the official code for : 1) TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/abs/2103.04430) , accepted by MICCAI2021. 2) TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images(https://arxiv.org/abs/2201.12785). (391 stars)