TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation (2107.05274v2)

Published 12 Jul 2021 in eess.IV and cs.CV

Abstract: Accurate segmentation of organs or lesions from medical images is crucial for reliable diagnosis of diseases and organ morphometry. In recent years, convolutional encoder-decoder solutions have achieved substantial progress in the field of automatic medical image segmentation. Due to the inherent bias in the convolution operations, prior models mainly focus on local visual cues formed by the neighboring pixels, but fail to fully model the long-range contextual dependencies. In this paper, we propose a novel Transformer-based Attention Guided Network called TransAttUnet, in which the multi-level guided attention and multi-scale skip connection are designed to jointly enhance the performance of the semantical segmentation architecture. Inspired by Transformer, the self-aware attention (SAA) module with Transformer Self Attention (TSA) and Global Spatial Attention (GSA) is incorporated into TransAttUnet to effectively learn the non-local interactions among encoder features. Moreover, we also use additional multi-scale skip connections between decoder blocks to aggregate the upsampled features with different semantic scales. In this way, the representation ability of multi-scale context information is strengthened to generate discriminative features. Benefitting from these complementary components, the proposed TransAttUnet can effectively alleviate the loss of fine details caused by the stacking of convolution layers and the consecutive sampling operations, finally improving the segmentation quality of medical images. Extensive experiments on multiple medical image segmentation datasets from different imaging modalities demonstrate that the proposed method consistently outperforms the state-of-the-art baselines. Our code and pre-trained models are available at: https://github.com/YishuLiu/TransAttUnet.

PDF Abstract

TransAttUnet: Enhancing Medical Image Segmentation with Transformers

The research paper introduces TransAttUnet, a novel architecture that leverages attention mechanisms and transformers to improve medical image segmentation. This model addresses limitations in traditional convolutional encoder-decoder designs, particularly their difficulty in modeling long-range dependencies due to convolutional operations' locality.

Key Contributions

TransAttUnet integrates several innovative components:

Transformer-based Attention: The architecture introduces a Transformer-based Self-Attention (TSA) mechanism that enhances the learning of long-range contextual interactions among encoder features. Self-awareness in the network is strengthened by incorporating both TSA and Global Spatial Attention (GSA), which improves the semantic consistency across different feature representations.
Multi-scale Skip Connections: The model introduces multi-scale skip connections allowing for the aggregation of features at different semantic scales through the decoder network. This multi-scale aggregation helps maintain fine details in the segmentation mask, overcoming a common issue where critical information is lost during sampling operations in conventional networks.

These components are designed to act complementarily, effectively mitigating the issues of detail loss and limited contextual understanding observed in previous models.

Experimental Evaluation

Extensive experiments were conducted on multiple datasets, such as ISIC-2018 for skin lesion segmentation, lung segmentation datasets combining JSRT, Montgomery, and NIH, and the Clean-CC-CCII dataset for Covid-19 pneumonia segmentation, among others. The performance of TransAttUnet was evaluated against several baseline approaches, including traditional CNN-based methods and recent Transformer-based models.

Results indicate that TransAttUnet consistently outperformed state-of-the-art models across these datasets. For instance, on the ISIC-2018 dataset, TransAttUnet achieved a Dice coefficient of 90.74%, surpassing both attention-guided and multi-scale context models. These improvements are attributed to the ability of TransAttUnet to accurately capture and utilize global information, helping it excel where models like vanilla U-Net and various attention-based U-Net modifications fall short.

Implications and Future Directions

From a practical standpoint, TransAttUnet sets a new benchmark for medical image segmentation, demonstrating significant improvements in accuracy and robustness. The incorporation of transformers into a U-Net framework provides a pathway to integrate advanced attention mechanisms in medical imaging tasks, potentially improving diagnostics and treatment plans.

Theoretical implications include the validation of combining traditional CNN architectures with transformer models to address specific limitations inherent in each. This synergy leverages transformers' strength in capturing global context while maintaining CNNs’ proficiency in feature extraction, providing a comprehensive approach to complex image segmentation challenges.

Future research could explore optimizing the computational efficiency of TransAttUnet. Transformers are known for their high computational demands, which might limit their applicability in resource-constrained environments. Further investigation into lightweight transformer components or approximate algorithms could make this powerful approach more practical for widespread clinical use. Additionally, exploring the application of TransAttUnet in broader medical imaging contexts or other domains requiring precise segmentation presents another avenue for future work.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Bingzhi Chen (5 papers)
Yishu Liu (3 papers)
Zheng Zhang (486 papers)
Guangming Lu (49 papers)
Adams Wai Kin Kong (8 papers)

Citations (165)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - YishuLiu/TransAttUnet (19 stars)