One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts (2312.17183v3)
Abstract: In this study, we aim to build up a model that can Segment Anything in radiology scans, driven by Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans from 72 segmentation datasets, across 497 classes, with careful standardization on both image scans and label space; (ii) for architecture design, we propose to inject medical knowledge into a text encoder via contrastive learning, and then formulate a universal segmentation model, that can be prompted by feeding in medical terminologies in text form; (iii) As a result, we have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters), demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets. We validate SAT as a foundational segmentation model, with better generalization ability on external (unseen) datasets, and can be further improved on specific tasks after fine-tuning adaptation. Comparing with interactive segmentation model, for example, MedSAM, segmentation model prompted by text enables superior performance, scalability and robustness. As a use case, we demonstrate that SAT can act as a powerful out-of-the-box agent for LLMs, enabling visual grounding in clinical procedures such as report generation. All the data, codes, and models in this work have been released.
- Journal of Medical Internet Research, 23(7):e26151, 2021.
- OpenAI (2023). Gpt-4 technical report, 2023.
- The medical segmentation decathlon. Nature communications, 13(1):4128, 2022.
- Neural segmentation of seeding rois (srois) for pre-surgical brain tractography. IEEE Transactions on Medical Imaging, 39(5):1655–1667, 2019.
- Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629, 2018.
- A radiogenomic dataset of non-small cell lung cancer. Scientific Data, 5(1):1–9, 2018.
- Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Transactions on Medical Imaging, 37(11):2514–2525, 2018.
- Olivier Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270, 2004.
- Universeg: Universal medical image segmentation. arXiv preprint arXiv:2304.06131, 2023.
- Swin-unet: Unet-like pure transformer for medical image segmentation. In European Conference on Computer Vision, pages 205–218. Springer, 2022.
- Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation. IEEE Transactions on Medical Imaging, 39(7):2494–2505, 2020.
- 3d transunet: Advancing medical image segmentation through vision transformers. arXiv preprint arXiv:2310.07781, 2023.
- Sam-med2d. arXiv preprint arXiv:2308.16184, 2023.
- 3d u-net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 424–432. Springer, 2016.
- Collaborative learning of cross-channel clinical attention for radiotherapy-related esophageal fistula prediction from ct. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 212–220. Springer, 2020.
- What can be transferred: Unsupervised domain adaptation for endoscopic lesions segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4023–4032, 2020.
- 3d deeply supervised network for automated segmentation of volumetric medical images. Medical Image Analysis, 41:40–54, 2017.
- Segvol: Universal and interactive volumetric medical image segmentation. arXiv preprint arXiv:2311.13385, 2023.
- Attention to lesion: Lesion-aware convolutional neural network for retinal optical coherence tomography image classification. IEEE Transactions on Medical Imaging, 38(8):1959–1970, 2019.
- Multi-organ segmentation over partially labeled datasets with multi-scale feature abstraction. IEEE Transactions on Medical Imaging, 39(11):3619–3629, 2020.
- Automatic multi-organ segmentation on abdominal ct with dense v-networks. IEEE Transactions on Medical Imaging, 37(8):1822–1834, 2018.
- 3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable medical image segmentation. arXiv preprint arXiv:2306.13465, 2023.
- Ivan Gonzalez-Diaz. Dermaknet: Incorporating the knowledge of dermatologists to convolutional neural networks for skin lesion diagnosis. IEEE journal of biomedical and health informatics, 23(2):547–559, 2018.
- Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. In International MICCAI Brainlesion Workshop, pages 272–284. Springer, 2021.
- Unetr: Transformers for 3d medical image segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 574–584, 2022.
- Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE Transactions on Medical Imaging, 28(8):1251–1265, 2009.
- The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge. Medical Image Analysis, page 101821, 2020.
- The kits21 challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase ct. arXiv preprint arXiv:2307.01984, 2023.
- Isles 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset. Scientific Data, 9(1):762, 2022.
- Segment anything model for medical images? arXiv preprint arXiv:2304.14660, 2023.
- nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2):203–211, 2021.
- Continual segment: Towards a single, unified and non-forgetting continual segmentation model of 143 whole-body organs in ct scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21140–21151, 2023.
- Chaos challenge-combined (ct-mr) healthy abdominal organ segmentation. Medical Image Analysis, 69:101950, 2021.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NACCL-HLT), pages 4171–4186, 2019.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Standardized assessment of automatic segmentation of white matter hyperintensities and results of the wmh segmentation challenge. IEEE Transactions on Medical Imaging, 38(11):2556–2568, 2019.
- Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
- Learning local shape and appearance for segmentation of knee cartilage in 3d mri. In Medical Image Computing and Computer Assisted Intervention (MICCAI), pages 231–240, 2010.
- Unibrain: Universal brain mri diagnosis with hierarchical knowledge-enhanced pre-training. arXiv preprint arXiv:2309.06828, 2023.
- Attention based glaucoma detection: A large-scale database and cnn model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10571–10580, 2019.
- H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes. IEEE Transactions on Medical Imaging, 37(12):2663–2674, 2018.
- Evaluation of prostate segmentation algorithms for mri: the promise12 challenge. Medical Image Analysis, 18(2):359–373, 2014.
- Clip-driven universal model for organ segmentation and tumor detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21152–21164, 2023.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Word: A large scale dataset, benchmark and clinical applicable study for abdominal organ segmentation from ct image. Medical Image Analysis, 82:102642–102642, 2022.
- Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
- Toward data-efficient learning: A benchmark for covid-19 ct lung and infection segmentation. Medical physics, 48(3):1197–1210, 2021.
- Unleashing the strengths of unlabeled data in pan-cancer abdominal organ quantification: the flare22 challenge. arXiv preprint arXiv:2308.05862, 2023.
- Abdomenct-1k: Is abdominal organ segmentation a solved problem? IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6695–6714, 2021.
- Metrics reloaded: Pitfalls and recommendations for image analysis validation. arXiv. org, (2206.01653), 2022.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision (3DV), pages 565–571. Ieee, 2016.
- Spineparsenet: spine parsing for volumetric mr image by a two-stage segmentation framework with semantic image representation. IEEE Transactions on Medical Imaging, 40(1):262–273, 2020.
- Brain tumor segmentation using convolutional neural networks in mri images. IEEE Transactions on Medical Imaging, 35(5):1240–1251, 2016.
- Han-seg: The head and neck organ-at-risk ct and mr segmentation dataset. Medical physics, 50(3):1917–1927, 2023.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 234–241. Springer, 2015.
- Attention gated networks: Learning to leverage salient regions in medical images. Medical Image Analysis, 53:197–207, 2019.
- Construction of a consistent high-definition spatio-temporal atlas of the developing brain using adaptive kernel regression. Neuroimage, 59(3):2255–2265, 2012.
- Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge. Medical image analysis, 42:1–13, 2017.
- Automatic couinaud segmentation from ct volumes on liver using glc-unet. In International Workshop on Machine Learning in Medical Imaging, pages 274–282. Springer, 2019.
- Multitalent: A multi-dataset approach to medical image segmentation. arXiv preprint arXiv:2303.14444, 2023.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
- Sam-med3d. arXiv preprint arXiv:2310.15161, 2023.
- Does non-covid-19 lung lesion help? investigating transferability in covid-19 ct image segmentation. Computer Methods and Programs in Biomedicine, 202:106004, 2021.
- Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence, 5(5), 2023.
- Integrating features from lymph node stations for metastatic lymph node detection. Computerized Medical Imaging and Graphics, 101:102108, 2022.
- K-diag: Knowledge-enhanced disease diagnosis in radiographic imaging. MICCAI Workshop on Big Task Small Data (BTSD), 2023.
- Medklip: Medical knowledge enhanced language-image pre-training for x-ray diagnosis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21372–21383, October 2023.
- A survey on incorporating domain knowledge into deep learning for medical image analysis. Medical Image Analysis, 69:101985, 2021.
- Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 171–180. Springer, 2021.
- Uniseg: A prompt-driven universal segmentation model as well as a strong representation learner. arXiv preprint arXiv:2304.03493, 2023.
- Recurrent saliency transformation network: Incorporating multi-stage visual cues for small organ segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8280–8289, 2018.
- Knowledge-enhanced visual-language pre-training on chest radiology images. Nature Communications, 14(1):4542, 2023.
- mmformer: Multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 107–117. Springer, 2022.
- Continual learning for abdominal multi-organ and tumor segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 35–45. Springer, 2023.
- Decoupled pyramid correlation network for liver tumor segmentation from ct images. Medical Physics, 49(11):7207–7221, 2022.
- Modality-aware mutual learning for multi-modal medical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 589–599. Springer, 2021.
- Large-scale long-tailed disease diagnosis on radiology images. arXiv preprint arXiv:2312.16151, 2023.
- nnformer: volumetric medical image segmentation via a 3d transformer. IEEE Transactions on Image Processing, 2023.
- Ted: Two-stage expert-guided interpretable diagnosis framework for microvascular invasion in hepatocellular carcinoma. Medical Image Analysis, 82:102575, 2022.
- Uncertainty-aware incremental learning for multi-organ segmentation. arXiv preprint arXiv:2103.05227, 2021.
- Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 39(6):1856–1867, 2019.