Medical Image Segmentation Using Squeeze-and-Expansion Transformers (2105.09511v3)

Published 20 May 2021 in eess.IV and cs.CV

Abstract: Medical image segmentation is important for computer-aided diagnosis. Good segmentation demands the model to see the big picture and fine details simultaneously, i.e., to learn image features that incorporate large context while keep high spatial resolutions. To approach this goal, the most widely used methods -- U-Net and variants, extract and fuse multi-scale features. However, the fused features still have small "effective receptive fields" with a focus on local image cues, limiting their performance. In this work, we propose Segtran, an alternative segmentation framework based on transformers, which have unlimited "effective receptive fields" even at high feature resolutions. The core of Segtran is a novel Squeeze-and-Expansion transformer: a squeezed attention block regularizes the self attention of transformers, and an expansion block learns diversified representations. Additionally, we propose a new positional encoding scheme for transformers, imposing a continuity inductive bias for images. Experiments were performed on 2D and 3D medical image segmentation tasks: optic disc/cup segmentation in fundus images (REFUGE'20 challenge), polyp segmentation in colonoscopy images, and brain tumor segmentation in MRI scans (BraTS'19 challenge). Compared with representative existing methods, Segtran consistently achieved the highest segmentation accuracy, and exhibited good cross-domain generalization capabilities. The source code of Segtran is released at https://github.com/askerlee/segtran.

Authors (6)

Shaohua Li (43 papers)
Xiuchao Sui (7 papers)
Xiangde Luo (31 papers)
Xinxing Xu (33 papers)
Yong Liu (721 papers)
Rick Goh (6 papers)

Citations (153)

View on Semantic Scholar

Summary

Analysis of "Medical Image Segmentation Using Squeeze-and-Expansion Transformers"

The paper "Medical Image Segmentation Using Squeeze-and-Expansion Transformers" by Shaohua Li et al. provides a comprehensive examination of a novel segmentation framework, termed Segtran, focused on medical image segmentation tasks. The authors tackle the limitations posed by conventional methods like U-Net and its variants, which suffer from limited effective receptive fields due to their reliance on local image cues. The proposed Segtran framework leverages transformers' abilities to process global context information without sacrificing spatial resolution, showcasing an innovative integration of squeeze-and-expansion mechanisms within the transformer architecture.

Segtran Architecture Overview

The Segtran architecture is centered around a custom Squeeze-and-Expansion Transformer, which introduces two novel components—Squeezed Attention Block (SAB) and Expanded Attention Block (EAB). The SAB regularizes the attention mechanism by compressing the expansive pairwise attention computations typical of transformers, while the EAB constitutes a mixture-of-experts model. Together, they aim to improve the transformer's ability to capture diverse data variations inherent in medical imaging tasks. Additionally, a learnable sinusoidal positional encoding scheme is implemented, imbuing the transformer with an inductive bias suited for image task continuity.

The Segtran framework also adopts a dual Feature Pyramid Network (FPN) configuration to effectively handle varying spatial resolutions of the input medical images. This approach ensures that the transformer's unlimited effective receptive fields can be fully utilized without diminishing the fine-grained details required for precise segmentation.

Empirical Evaluation and Results

The authors conducted experiments across three segmentation tasks: optic disc/cup segmentation in fundus images (REFUGE'20 challenge), polyp segmentation in colonoscopy images, and brain tumor segmentation in MRI scans (BraTS'19 challenge). Segtran demonstrated superior segmentation accuracy and cross-domain generalization capabilities against existing methods, including enhanced U-Net variants, PraNet, DeepLabV3+, and other transformer-based models like SETR and TransU-Net.

The numerical results substantiate Segtran's capability to outperform counterparts, with particularly compelling performances in retaining spatial detail while capitalizing on global context understanding. Moreover, Segtran exhibited remarkable adaptability to novel domains, an essential attribute given the diversity of real-world medical image environments.

Implications and Future Directions

The introduction of Segtran marks a significant enhancement in transformer-based approaches for medical image segmentation. By addressing key challenges such as the balance between global and local context processing, Segtran creates a foundation upon which future research can build. The Squeeze-and-Expansion Transformer stands out due to its architectural innovations, potentially influencing subsequent advances in transformer adaptations for other vision-related tasks.

Looking ahead, the exploration of Segtran’s potential in real-time applications, further processing optimizations to reduce computational overhead, and assessments across other medical imaging modalities could provide fruitful areas of inquiry. Moreover, the adaptability of Segtran's design invites its application in other domains where similar context versus detail dilemmas are faced, enhancing its relevance beyond medical imaging.

The comprehensive results and methodical evaluation presented in this paper demonstrate the efficacy of Segtran and underscore the crucial role of architectural innovation in advancing the frontiers of computational imaging segmentation.

PDF Markdown

Related Papers

GitHub

GitHub - askerlee/segtran: Medical Image Segmentation using Squeeze-and-Expansion Transformers (216 stars)