Harmonized Spatial and Spectral Learning for Robust and Generalized Medical Image Segmentation (2401.10373v2)
Abstract: Deep learning has demonstrated remarkable achievements in medical image segmentation. However, prevailing deep learning models struggle with poor generalization due to (i) intra-class variations, where the same class appears differently in different samples, and (ii) inter-class independence, resulting in difficulties capturing intricate relationships between distinct objects, leading to higher false negative cases. This paper presents a novel approach that synergies spatial and spectral representations to enhance domain-generalized medical image segmentation. We introduce the innovative Spectral Correlation Coefficient objective to improve the model's capacity to capture middle-order features and contextual long-range dependencies. This objective complements traditional spatial objectives by incorporating valuable spectral information. Extensive experiments reveal that optimizing this objective with existing architectures like UNet and TransUNet significantly enhances generalization, interpretability, and noise robustness, producing more confident predictions. For instance, in cardiac segmentation, we observe a 0.81 pp and 1.63 pp (pp = percentage point) improvement in DSC over UNet and TransUNet, respectively. Our interpretability study demonstrates that, in most tasks, objectives optimized with UNet outperform even TransUNet by introducing global contextual information alongside local details. These findings underscore the versatility and effectiveness of our proposed method across diverse imaging modalities and medical domains.
- Multi-atlas abdomen labeling challenge. synapse multi-organ segmentation dataset. Available at: https://www.synapse.org/#!Synapse:syn3193805/wiki/217789, 2015.
- Acdc (automated cardiac diagnosis challenge). Available at: https://www.creatis.insa-lyon.fr/Challenge/acdc, 2017.
- A multi-centre polyp detection and segmentation dataset for generalisability assessment. Scientific Data, 10(1):75, 2023.
- Bi-directional convlstm u-net with densley connected convolutions. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019.
- Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
- Mfsnet: A multi focus segmentation network for skin lesion segmentation. Pattern Recognition, 128:108673, 2022.
- Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, pages 205–218, 2022.
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
- Jun Cheng. Brain tumor dataset. https://doi.org/10.6084/m9.figshare.1512427.v5, 2017.
- Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pages 168–172, 2018.
- Inductive bias of deep convolutional networks through pooling geometry. arXiv preprint arXiv:1605.06743, 2016.
- Domain adaptive relational reasoning for 3d multi-organ segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pages 656–666, 2020.
- Rethinking intermediate layers design in knowledge distillation for kidney and liver tumor segmentation. arXiv preprint arXiv:2311.16700, 2023.
- Synergynet: Bridging the gap between discrete and continuous representations for precise medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7768–7777, 2024.
- Pacl: Patient-aware contrastive learning through metadata refinement for generalized early disease diagnosis. Computers in Biology and Medicine, 167:107569, 2023.
- Large scale time-series representation learning via simultaneous low-and high-frequency feature bootstrapping. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 574–584, 2022.
- Hiformer: Hierarchical multi-scale representations using transformers for medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6202–6212, 2023.
- Kvasir-seg: A segmented polyp dataset. In Proceedings of the 26th International Conference on MultiMedia Modeling, pages 451–462, 2020.
- Generalization in deep learning. arXiv preprint arXiv:1710.05468, 1(8), 2017.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- A multi-organ nucleus segmentation challenge. IEEE transactions on medical imaging, 39(5):1380–1391, 2019.
- George Leibbrandt. Introduction to the technique of dimensional regularization. Reviews of Modern Physics, 47(4):849, 1975.
- A global-frequency-domain network for medical image segmentation. Computers in Biology and Medicine, page 107290, 2023.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
- Ph 2-a dermoscopic image database for research and benchmarking. In 2013 35th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pages 5437–5440, 2013.
- Survey on deep learning for radiotherapy. Computers in biology and medicine, 98:126–146, 2018.
- Exploring corruption robustness: inductive biases in vision transformers and mlp-mixers. arXiv preprint arXiv:2106.13122, 2021.
- Exploring generalization in deep learning. Advances in neural information processing systems, 30, 2017.
- Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision, pages 69–84, 2016.
- Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999, 2018.
- How do vision transformers work? arXiv preprint arXiv:2202.06709, 2022.
- Judea Pearl. Causal inference in statistics: An overview. Statistics Surveys, 3:96–146, 01 2009.
- Computational vision and regularization theory. Readings in computer vision, pages 638–643, 1987.
- Do vision transformers see like convolutional neural networks? Advances in Neural Information Processing Systems, 34:12116–12128, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241, 2015.
- Deep learning empowered volume delineation of whole-body organs-at-risk for accelerated radiotherapy. Nature Communications, 13(1):6566, 2022.
- Multimodal self-supervised learning for medical image analysis. In International conference on information processing in medical imaging, pages 661–673, 2021.
- Theoretical analysis of inductive biases in deep convolutional networks. arXiv preprint arXiv:2305.08404, 2023.
- Masked feature prediction for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14668–14678, 2022.
- Source free domain adaptation for medical image segmentation with fourier style mining. Medical Image Analysis, 79:102457, 2022.
- Fda: Fourier domain adaptation for semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4085–4095, 2020.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.