BEFUnet: A Hybrid CNN-Transformer Architecture for Precise Medical Image Segmentation (2402.08793v1)
Abstract: The accurate segmentation of medical images is critical for various healthcare applications. Convolutional neural networks (CNNs), especially Fully Convolutional Networks (FCNs) like U-Net, have shown remarkable success in medical image segmentation tasks. However, they have limitations in capturing global context and long-range relations, especially for objects with significant variations in shape, scale, and texture. While transformers have achieved state-of-the-art results in natural language processing and image recognition, they face challenges in medical image segmentation due to image locality and translational invariance issues. To address these challenges, this paper proposes an innovative U-shaped network called BEFUnet, which enhances the fusion of body and edge information for precise medical image segmentation. The BEFUnet comprises three main modules, including a novel Local Cross-Attention Feature (LCAF) fusion module, a novel Double-Level Fusion (DLF) module, and dual-branch encoder. The dual-branch encoder consists of an edge encoder and a body encoder. The edge encoder employs PDC blocks for effective edge information extraction, while the body encoder uses the Swin Transformer to capture semantic information with global attention. The LCAF module efficiently fuses edge and body features by selectively performing local cross-attention on features that are spatially close between the two modalities. This local approach significantly reduces computational complexity compared to global cross-attention while ensuring accurate feature matching. BEFUnet demonstrates superior performance over existing methods across various evaluation metrics on medical image segmentation datasets.
- Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation. arXiv preprint arXiv:1802.06955, 2018.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
- Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1090–1099, 2022.
- Dense-unet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quantitative imaging in medicine and surgery, 10(6):1275, 2020.
- Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537, 2021.
- MICCAI 2015 Multi-Atlas Abdomen Labeling Challenge. Synapse multi-organ segmentation dataset. https://www.synapse.org/#!Synapse:syn3193805/wiki/217789, 2015. Accessed: 2022-04-20.
- Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 357–366, 2021.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018.
- Learning directional feature maps for cardiac mri segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part IV 23, pages 108–117. Springer, 2020.
- Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1902.03368, 2019.
- Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pages 168–172. IEEE, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Domain adaptive relational reasoning for 3d multi-organ segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 656–666. Springer, 2020.
- A multi-scale transformer for medical image segmentation: Architectures, model efficiency, and benchmarks. arXiv preprint arXiv:2203.00131, 2022.
- A data-scalable transformer for medical image segmentation: architecture, model efficiency, and benchmark. arXiv preprint arXiv:2203.00131, 2022.
- Multi-scale high-resolution vision transformer for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12094–12103, 2022.
- Segpc-2021: Segmentation of multiple myeloma plasma cells in microscopic images, 2021.
- Pcseg: Color model driven probabilistic multiphase level set based tool for plasma cell segmentation in multiple myeloma. PloS one, 13(12):e0207908, 2018.
- Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1055–1059. IEEE, 2020.
- Missformer: An effective medical image segmentation transformer. arXiv preprint arXiv:2109.07162, 2021.
- Doubleu-net: A deep convolutional neural network for medical image segmentation. In 2020 IEEE 33rd International symposium on computer-based medical systems (CBMS), pages 558–564. IEEE, 2020.
- Resunet++: An advanced architecture for medical image segmentation. In 2019 IEEE international symposium on multimedia (ISM), pages 225–2255. IEEE, 2019.
- Focusnet: An attention-based fully convolutional network for medical image segmentation. In 2019 IEEE 16th international symposium on biomedical imaging (ISBI 2019), pages 455–458. IEEE, 2019.
- Bea-segnet: Body and edge aware network for medical image segmentation. In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 939–944. IEEE, 2021.
- Skin lesion segmentation via generative adversarial networks with dual discriminators. Medical Image Analysis, 64:101716, 2020.
- Deepfusion: Lidar-camera deep fusion for multi-modal 3d object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17182–17191, 2022.
- Ds-transunet: Dual swin transformer u-net for medical image segmentation. IEEE Transactions on Instrumentation and Measurement, 2022.
- Richer convolutional features for edge detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3000–3009, 2017.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
- Ph 2-a dermoscopic image database for research and benchmarking. In 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pages 5437–5440. IEEE, 2013.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
- Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.
- U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Ege-unet: an efficient group enhanced unet for skin lesion segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 481–490. Springer, 2023.
- Attention gated networks: Learning to leverage salient regions in medical images. Medical image analysis, 53:197–207, 2019.
- Pixel difference networks for efficient edge detection. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5117–5127, 2021.
- Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
- A shape-based approach to the segmentation of medical imagery using level sets. IEEE transactions on medical imaging, 22(2):137–154, 2003.
- Medical transformer: Gated axial-attention for medical image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, pages 36–46. Springer, 2021.
- Kiu-net: Towards accurate segmentation of biomedical images using over-complete representations. In International conference on medical image computing and computer-assisted intervention, pages 363–373. Springer, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Fat-net: Feature adaptive transformers for automated skin lesion segmentation. Medical Image Analysis, 76:102327, 2022.
- Transformers in medical image segmentation: A review. Biomedical Signal Processing and Control, 84:104791, 2023.
- Weighted res-unet for high-quality retina vessel segmentation. In 2018 9th international conference on information technology in medicine and education (ITME), pages 327–331. IEEE, 2018.
- Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015.
- Levit-unet: Make faster encoders with transformer for medical image segmentation. arXiv preprint arXiv:2107.08623, 2021.
- Cswin-pnet: A cnn-swin transformer combined pyramid network for breast lesion segmentation in ultrasound images. Expert Systems with Applications, 213:119024, 2023.
- Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
- Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2881–2890, 2017.
- Msnet: Multi-scale in multi-scale subtraction network for medical image segmentation. arXiv preprint arXiv:2303.10894, 2023.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pages 3–11. Springer, 2018.
- Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal mri. Information Fusion, 91:376–387, 2023.
- Omid Nejati Manzari (9 papers)
- Javad Mirzapour Kaleybar (3 papers)
- Hooman Saadat (2 papers)
- Shahin Maleki (1 paper)