TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images (2201.12785v3)

Published 30 Jan 2022 in eess.IV and cs.CV

Abstract: Transformer, benefiting from global (long-range) information modeling using self-attention mechanism, has been successful in natural language processing and computer vision recently. Convolutional Neural Networks, capable of capturing local features, are difficult to model explicit long-distance dependencies from global feature space. However, both local and global features are crucial for dense prediction tasks, especially for 3D medical image segmentation. In this paper, we present the further attempt to exploit Transformer in 3D CNN for 3D medical image volumetric segmentation and propose a novel network named TransBTSV2 based on the encoder-decoder structure. Different from TransBTS, the proposed TransBTSV2 is not limited to brain tumor segmentation (BTS) but focuses on general medical image segmentation, providing a stronger and more efficient 3D baseline for volumetric segmentation of medical images. As a hybrid CNN-Transformer architecture, TransBTSV2 can achieve accurate segmentation of medical images without any pre-training, possessing the strong inductive bias as CNNs and powerful global context modeling ability as Transformer. With the proposed insight to redesign the internal structure of Transformer block and the introduced Deformable Bottleneck Module to capture shape-aware local details, a highly efficient architecture is achieved with superior performance. Extensive experimental results on four medical image datasets (BraTS 2019, BraTS 2020, LiTS 2017 and KiTS 2019) demonstrate that TransBTSV2 achieves comparable or better results compared to the state-of-the-art methods for the segmentation of brain tumor, liver tumor as well as kidney tumor. Code will be publicly available at https://github.com/Wenxuan-1119/TransBTS.

PDF Abstract

TransBTSV2: Advancements in Volumetric Medical Image Segmentation

The paper presents TransBTSV2, an evolved architecture designed to enhance the segmentation of volumetric medical images by integrating both Convolutional Neural Networks (CNNs) and Transformers. The convergence of these methodologies allows TransBTSV2 to capitalize on local feature extraction typical of CNNs, while also leveraging the global context modeling capabilities inherent in Transformer architectures.

Key Innovations

Hybrid Architecture: Unlike its predecessor, TransBTS, which was tailored specifically for brain tumor segmentation, TransBTSV2 generalizes the application to broader medical image datasets. It harmonizes the CNN's local context capturing with the Transformer's long-range dependencies, thereby achieving comprehensive feature extraction.
Redesigned Transformer Blocks: Addressing the high computational costs seen in traditional deep Transformer designs, TransBTSV2 proposes a wider over deeper architecture. This approach reduces model complexity—demonstrated by a 53.62% decrease in parameters and a 27.75% reduction in FLOPs—while still enhancing performance. This design leverages insight from MobileNetV2-inspired inverted bottlenecks to optimize database-wide representations without extensive computational burden.
Deformable Bottleneck Module: This module is introduced at skip-connection points to address the irregular shapes and boundaries common in tumor regions. By learning adaptive volumetric spatial offsets, the DBM improves the model's ability to delineate complex, shape-variable lesions astutely.

Empirical Evaluation and Results

The proposed model is evaluated across four key datasets: BraTS 2019, BraTS 2020, LiTS 2017, and KiTS 2019, focusing on brain, liver, and kidney tumors. On BraTS datasets, TransBTSV2 achieved Dice scores of 80.24% for enhancing tumor, 90.42% for whole tumor, and 84.87% for tumor core regions. Numerically, these metrics illustrate notable improvements compared to contemporary architectures, validating the efficacy of the introduced techniques, especially regarding global feature modeling and fine boundary capturing.

Further, on LiTS 2017 and KiTS 2019, TransBTSV2 outperformed existing methods, particularly enhancing lesion segmentation accuracy. This demonstrates the model's versatility across different organ and imaging modalities.

Broader Implications and Future Trajectories

TransBTSV2 sets a substantial foundation for hybrid models in medical image analysis. By moving away from solely data-heavy training regimes typical of Transformers or the locality constraints of CNNs, this architecture aspires to balance computational efficiency with exemplary performance, advocating for adaptable, precise segmentation solutions.

The insights gained from TransBTSV2 could catalyze subsequent research exploring wider adoption of adaptive hybrid architectures in medical imaging, potentially extending into real-time diagnostic applications or other imaging domains like histopathology.

Moving forward, exploring the integration of further architectural enhancements, such as attention-guided feature refinement or multi-scale processing, may further consolidate the hybrid model's performance across increasingly diverse applications.

In summation, the document makes a compelling case for TransBTSV2 as not only a significant stride in volumetric medical image segmentation but also a benchmark for future hybrid model developments.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Jiangyun Li (14 papers)
Wenxuan Wang (128 papers)
Chen Chen (752 papers)
Tianxiang Zhang (10 papers)
Sen Zha (5 papers)
Jing Wang (740 papers)
Hong Yu (114 papers)

Citations (18)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - Rubics-Xuan/TransBTS: This repo provides the official code for : 1) TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/abs/2103.04430) , accepted by MICCAI2021. 2) TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images(https://arxiv.org/abs/2201.12785). (388 stars)