TransDAE: Dual Attention Mechanism in a Hierarchical Transformer for Efficient Medical Image Segmentation (2409.02018v1)
Abstract: In healthcare, medical image segmentation is crucial for accurate disease diagnosis and the development of effective treatment strategies. Early detection can significantly aid in managing diseases and potentially prevent their progression. Machine learning, particularly deep convolutional neural networks, has emerged as a promising approach to addressing segmentation challenges. Traditional methods like U-Net use encoding blocks for local representation modeling and decoding blocks to uncover semantic relationships. However, these models often struggle with multi-scale objects exhibiting significant variations in texture and shape, and they frequently fail to capture long-range dependencies in the input data. Transformers designed for sequence-to-sequence predictions have been proposed as alternatives, utilizing global self-attention mechanisms. Yet, they can sometimes lack precise localization due to insufficient granular details. To overcome these limitations, we introduce TransDAE: a novel approach that reimagines the self-attention mechanism to include both spatial and channel-wise associations across the entire feature space, while maintaining computational efficiency. Additionally, TransDAE enhances the skip connection pathway with an inter-scale interaction module, promoting feature reuse and improving localization accuracy. Remarkably, TransDAE outperforms existing state-of-the-art methods on the Synaps multi-organ dataset, even without relying on pre-trained weights.
- Segpc-2021: A challenge & dataset on segmentation of multiple myeloma plasma cells from microscopic images. Medical Image Analysis, 83:102677, 2023.
- D-former: A u-shaped dilated transformer for 3d medical image segmentation. arXiv preprint arXiv:2201.00462, 2022.
- Pyramid medical transformer for medical image segmentation. arXiv preprint arXiv:2104.14702, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing, 162:94–114, 2020.
- Denseunet: densely connected unet for electron microscopy image segmentation. IET Image Processing, 14(12):2682–2689, 2020.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pages 3–11. Springer, 2018.
- Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1055–1059. IEEE, 2020.
- Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
- Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017.
- Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.
- Deep learning prediction of response to anti-vegf among diabetic macular edema patients: Treatment response analyzer system (tras). Diagnostics, 12(2):312, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Laplacian-former: Overcoming the limitations of vision transformers in local texture detection. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 736–746. Springer, 2023.
- Medical image segmentation review: The success of u-net. arXiv preprint arXiv:2211.14830, 2022.
- 3d u-net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pages 424–432. Springer, 2016.
- Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
- Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3146–3154, 2019.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
- Levit: a vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12259–12269, 2021.
- Twins: Revisiting the design of spatial attention in vision transformers. Advances in Neural Information Processing Systems, 34, 2021.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537, 2021.
- Fat-net: Feature adaptive transformers for automated skin lesion segmentation. Medical Image Analysis, 76:102327, 2022.
- Improving fhb screening in wheat breeding using an efficient transformer model. In 2023 ASABE Annual International Meeting, page 1. American Society of Agricultural and Biological Engineers, 2023.
- Transbts: Multimodal brain tumor segmentation using transformer. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pages 109–119. Springer, 2021.
- Missformer: An effective transformer for 2d medical image segmentation. IEEE Transactions on Medical Imaging, pages 1–1, 2022.
- Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3531–3539, 2021.
- Self-attention generative adversarial networks. In International conference on machine learning, pages 7354–7363. PMLR, 2019.
- Enhancing medical image segmentation with transception: A multi-scale feature fusion approach. arXiv preprint arXiv:2301.10847, 2023.
- Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
- Residual attention network for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164, 2017.
- Bam: Bottleneck attention module. arXiv preprint arXiv:1807.06514, 2018.
- Visual attention network. Computational Visual Media, pages 1–20, 2023.
- Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, volume 5, page 12, 2015.
- Attention gated networks: Learning to leverage salient regions in medical images. Medical image analysis, 53:197–207, 2019.
- Levit-unet: Make faster encoders with transformer for medical image segmentation. arXiv preprint arXiv:2107.08623, 2021.
- Mixed transformer u-net for medical image segmentation. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2390–2394. IEEE, 2022.
- Transdeeplab: Convolution-free transformer-based deeplab v3+ for medical image segmentation. In Predictive Intelligence in Medicine, pages 91–102. Springer Nature Switzerland, 2022.
- Ffunet: A novel feature fusion makes strong decoder for medical image segmentation. IET Signal Processing, 16(5):501–514, 2022.
- Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6202–6212, 2023.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.