Multi-Modal Brain Tumor Segmentation via 3D Multi-Scale Self-attention and Cross-attention (2504.09088v1)

Published 12 Apr 2025 in eess.IV and cs.CV

Abstract: Due to the success of CNN-based and Transformer-based models in various computer vision tasks, recent works study the applicability of CNN-Transformer hybrid architecture models in 3D multi-modality medical segmentation tasks. Introducing Transformer brings long-range dependent information modeling ability in 3D medical images to hybrid models via the self-attention mechanism. However, these models usually employ fixed receptive fields of 3D volumetric features within each self-attention layer, ignoring the multi-scale volumetric lesion features. To address this issue, we propose a CNN-Transformer hybrid 3D medical image segmentation model, named TMA-TransBTS, based on an encoder-decoder structure. TMA-TransBTS realizes simultaneous extraction of multi-scale 3D features and modeling of long-distance dependencies by multi-scale division and aggregation of 3D tokens in a self-attention layer. Furthermore, TMA-TransBTS proposes a 3D multi-scale cross-attention module to establish a link between the encoder and the decoder for extracting rich volume representations by exploiting the mutual attention mechanism of cross-attention and multi-scale aggregation of 3D tokens. Extensive experimental results on three public 3D medical segmentation datasets show that TMA-TransBTS achieves higher averaged segmentation results than previous state-of-the-art CNN-based 3D methods and CNN-Transform hybrid 3D methods for the segmentation of 3D multi-modality brain tumors.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (3)

Yonghao Huang (2 papers)
Leiting Chen (5 papers)
Chuan Zhou (31 papers)

Multi-Modal Brain Tumor Segmentation via 3D Multi-Scale Self-attention and Cross-attention (2504.09088v1)

Related Papers