Narrowing the semantic gaps in U-Net with learnable skip connections: The case of medical image segmentation (2312.15182v1)
Abstract: Most state-of-the-art methods for medical image segmentation adopt the encoder-decoder architecture. However, this U-shaped framework still has limitations in capturing the non-local multi-scale information with a simple skip connection. To solve the problem, we firstly explore the potential weakness of skip connections in U-Net on multiple segmentation tasks, and find that i) not all skip connections are useful, each skip connection has different contribution; ii) the optimal combinations of skip connections are different, relying on the specific datasets. Based on our findings, we propose a new segmentation framework, named UDTransNet, to solve three semantic gaps in U-Net. Specifically, we propose a Dual Attention Transformer (DAT) module for capturing the channel- and spatial-wise relationships to better fuse the encoder features, and a Decoder-guided Recalibration Attention (DRA) module for effectively connecting the DAT tokens and the decoder features to eliminate the inconsistency. Hence, both modules establish a learnable connection to solve the semantic gaps between the encoder and the decoder, which leads to a high-performance segmentation model for medical images. Comprehensive experimental results indicate that our UDTransNet produces higher evaluation scores and finer segmentation results with relatively fewer parameters over the state-of-the-art segmentation methods on different public datasets. Code: https://github.com/McGregorWwww/UDTransNet.
- Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging 37, 2514–2525.
- Swin-unet: Unet-like pure transformer for medical image segmentation. https://arxiv.org/abs/2105.05537.
- Collaborative learning of weakly-supervised domain adaptation for diabetic retinopathy grading on retinal images. Computers in Biology and Medicine 144, 105341.
- TransUNet: Transformers make strong encoders for medical image segmentation. https://arxiv.org/abs/2102.04306.
- An end-to-end approach to segmentation in medical images with cnn and posterior-crf. Medical Image Analysis 76, 102311.
- Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic). https://arxiv.org/abs/1902.03368.
- An image is worth 16x16 words: Transformers for image recognition at scale, in: Int. Conf. Learn. Repr. (ICLR).
- Inf-net: Automatic COVID-19 lung infection segmentation from CT images. IEEE Transactions on Medical Imaging 39, 2626–2637.
- UTNet: A hybrid transformer architecture for medical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 61–71.
- UNETR: Transformers for 3d medical image segmentation, in: WACV, pp. 574–584.
- Metricunet: Synergistic image- and voxel-level learning for precise prostate segmentation via online sampling. Medical Image Analysis 71, 102039.
- MultiResUNet : Rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87.
- Multi-compound transformer for accurate biomedical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 326–336.
- Learning multi-scale synergic discriminative features for prostate image segmentation. Pattern Recognition 126, 108556.
- A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Transactions on Medical Imaging 36, 1550–1560.
- 2015 miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. 10.7303/syn3193805.
- Swin transformer: Hierarchical vision transformer using shifted windows, in: Proc. the IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 10012–10022.
- SGDR: Stochastic gradient descent with warm restarts, in: Int. Conf. Learn. Repr. (ICLR).
- Attention u-net: Learning where to look for the pancreas, in: MIDL, pp. 1–10.
- Nenet: Nested efficientnet and adversarial learning for joint optic disc and cup segmentation. Medical Image Analysis 74, 102253.
- Unet#: a unet-like redesigning skip connections for medical image segmentation. arXiv preprint arXiv:2205.11759 .
- U-net: Convolutional networks for biomedical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 234–241.
- Gland segmentation in colon histology images: The GlaS challenge contest. Med. Image Anal. 35, 489–502.
- Select, attend, and transfer: light, learnable skip connections, in: Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10, Springer. pp. 417–425.
- Instance normalization: The missing ingredient for fast stylization. https://arxiv.org/abs/1607.08022.
- Medical transformer: Gated axial-attention for medical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), pp. 36–46.
- Uctransnet: Rethinking the skip connections in u-net from a channel-wise perspective with transformer, in: AAAI.
- Dhc: Dual-debiased heterogeneous co-training framework for class-imbalanced semi-supervised medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 582–591.
- Towards generic semi-supervised framework for volumetric medical image segmentation. arXiv preprint arXiv:2310.11320 .
- Boundary-aware transformers for skin lesion segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 206–216.
- Non-local neural networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803.
- Non-local u-nets for biomedical image segmentation, in: Proceedings of the AAAI conference on artificial intelligence, pp. 6315–6322.
- Histoseg: Quick attention with multi-loss function for multi-structure segmentation in digital histology images, in: 2022 12th International Conference on Pattern Recognition Systems (ICPRS), IEEE. pp. 1–7.
- Cbam: convolutional block attention module. in proceedings of the european conference on computer vision (eccv): 3-19.
- Fat-net: Feature adaptive transformers for automated skin lesion segmentation. Medical Image Analysis 76, 102327.
- A multi-branch hybrid transformer networkfor corneal endothelial cell segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 99–108.
- TransFuse: Fusing transformers and CNNs for medical image segmentation, in: Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), p. 14–24.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers, in: Proc. Conf. Comput. Vis. Pattern Recognit. (CVPR), p. 6881–6890.
- UNet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans. Med. Imag. 39, 1856–1867.