Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography (2405.18356v1)
Abstract: The advancement of AI for organ segmentation and tumor detection is propelled by the growing availability of computed tomography (CT) datasets with detailed, per-voxel annotations. However, these AI models often struggle with flexibility for partially annotated datasets and extensibility for new classes due to limitations in the one-hot encoding, architectural design, and learning scheme. To overcome these limitations, we propose a universal, extensible framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes (e.g., organs/tumors). Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from LLMs, enriching semantic encoding compared with one-hot encoding. Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors and ease the addition of new classes. We train our Universal Model on 3,410 CT volumes assembled from 14 publicly available datasets and then test it on 6,173 CT volumes from four external datasets. Universal Model achieves first place on six CT tasks in the Medical Segmentation Decathlon (MSD) public leaderboard and leading performance on the Beyond The Cranial Vault (BTCV) dataset. In summary, Universal Model exhibits remarkable computational efficiency (6x faster than other dataset-specific models), demonstrates strong generalization across different hospitals, transfers well to numerous downstream tasks, and more importantly, facilitates the extensibility to new classes while alleviating the catastrophic forgetting of previously learned classes. Codes, models, and datasets are available at https://github.com/ljwztc/CLIP-Driven-Universal-Model
- The medical segmentation decathlon. arXiv preprint arXiv:2106.05735 .
- An end-to-end framework for universal lesion detection with missing annotations, in: 2022 16th IEEE International Conference on Signal Processing (ICSP), IEEE. pp. 411–415.
- The liver tumor segmentation benchmark (lits). Medical Image Analysis 84, 102680.
- The liver tumor segmentation benchmark (lits). arXiv preprint arXiv:1901.04056 .
- Language models are few-shot learners. Advances in neural information processing systems 33, 1877–1901.
- End-to-end adversarial shape learning for abdomen organ deep segmentation, in: Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10, Springer. pp. 124–132.
- Monai: An open-source framework for deep learning in healthcare, pp. arXiv–2211.
- Adapting pretrained vision-language foundational models to medical imaging domains. arXiv preprint arXiv:2210.04133 .
- Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 .
- Cancerunit: Towards a single unified model for effective detection, segmentation, and diagnosis of eight major cancers using a large collection of ct scans, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21327–21338.
- Learning robust shape regularization for generalizable medical image segmentation. IEEE Transactions on Medical Imaging .
- An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nature medicine 25, 1453–1457.
- Towards generalizable tumor synthesis, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition. URL: https://github.com/MrGiovanni/DiffTumor.
- Fast image processing with fully-convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, pp. 2497–2506.
- Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arXiv:1904.00625 .
- A deep learning-based auto-segmentation system for organs-at-risk on whole-body computed tomography images for radiation therapy. Radiotherapy and Oncology 160, 175–184.
- Versatile medical image segmentation learned from multi-source datasets via model self-disambiguation. arXiv preprint arXiv:2311.10696 .
- Cross-lingual language model pretraining. Advances in neural information processing systems 32.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 .
- Learning multi-class segmentations from single-class datasets, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9501–9511.
- Plop: Learning without forgetting for continual semantic segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4040–4050.
- Pubmedclip: How much does clip benefit visual question answering in the medical domain?, in: Findings of the Association for Computational Linguistics: EACL 2023, pp. 1151–1163.
- Deep learning-enabled medical computer vision. NPJ digital medicine 4, 1–9.
- Multi-organ segmentation over partially labeled datasets with multi-scale feature abstraction. IEEE Transactions on Medical Imaging 39, 3619–3629.
- Focusnetv2: Imbalanced large and small organ segmentation with adversarial shape constraint for head and neck ct images. Medical Image Analysis 67, 101831.
- Training like a medical resident: Universal medical image segmentation via context prior learning. arXiv preprint arXiv:2306.02416 .
- Liver segmentation: practical tips. Diagnostic and interventional imaging 95, 1003–1016.
- Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2423–2432.
- Semantic-oriented labeled-to-unlabeled distribution translation for image segmentation. IEEE transactions on medical imaging 41, 434–445.
- Transferable visual words: Exploiting the semantics of anatomical patterns for self-supervised learning. IEEE Transactions on Medical Imaging URL: https://github.com/fhaghighi/SemanticGenesis.
- Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021, Virtual Event, September 27, 2021, Revised Selected Papers, Part I, Springer. pp. 272–284.
- Unetr: Transformers for 3d medical image segmentation, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584.
- Swinunetr-v2: Stronger swin transformers with stagewise convolutions for 3d medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 416–426.
- Dints: Differentiable neural network topology search for 3d medical image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5841–5850.
- An international challenge to use artificial intelligence to define the state-of-the-art in kidney and kidney tumor segmentation in ct imaging.
- The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes. arXiv preprint arXiv:1904.00445 .
- Label-free liver tumor segmentation, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7422–7432. URL: https://github.com/MrGiovanni/SyntheticTumors.
- Scaling up vision-language pre-training for image captioning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17980–17989.
- A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine 29, 2307–2316.
- nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18, 203–211.
- Towards unifying anatomy segmentation: automated generation of a full-body ct dataset via knowledge aggregation and anatomical guidelines. arXiv preprint arXiv:2307.13375 .
- Learning calibrated medical image segmentation via multi-rater agreement modeling, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12341–12351.
- Amos: A large-scale abdominal multi-organ benchmark for versatile medical image segmentation. Neural Information Processing Systems (NeurIPS) .
- Continual segment: Towards a single, unified and non-forgetting continual segmentation model of 143 whole-body organs in ct scans, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21140–21151.
- Zept: Zero-shot pan-tumor segmentation via query-disentangling and self-prompting. arXiv preprint arXiv:2312.04964 .
- Scalable neural architecture search for 3d medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 220–228.
- From pixel to cancer: Cellular automata in computed tomography. arXiv preprint arXiv:2403.06459 URL: https://github.com/MrGiovanni/Pixel2Cancer.
- Multi-atlas labeling beyond the cranial vault-workshop and challenge .
- Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge, in: Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, p. 12.
- Catastrophic interference in neural networks: Causes, solutions, and data, in: Interference and inhibition in cognition. Elsevier, pp. 329–361.
- Early detection and localization of pancreatic cancer by label-free tumor synthesis. MICCAI Workshop on Big Task Small Data, 1001-AI URL: https://github.com/MrGiovanni/SyntheticTumors.
- How well do supervised models transfer to 3d image segmentation?, in: International Conference on Learning Representations. URL: https://github.com/MrGiovanni/SuPreM.
- Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40, 2935–2947.
- Incorporating the hybrid deformable model for improving the performance of abdominal ct segmentation via multi-scale feature fusion network. Medical Image Analysis 73, 102156.
- Cosst: Multi-organ segmentation with partially labeled datasets using comprehensive supervisions and self-training. IEEE Transactions on Medical Imaging .
- Graph-based surgical instrument adaptive segmentation via domain-common knowledge. IEEE Transactions on Medical Imaging 41, 715–726.
- Clip-driven universal model for organ segmentation and tumor detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21152–21164. URL: https://github.com/ljwztc/CLIP-Driven-Universal-Model.
- Universal segmentation of 33 anatomies. arXiv preprint arXiv:2203.02098 .
- 3d multi-organ and tumor segmentation based on re-parameterize diverse experts. Mathematics 11, 4868.
- Learning incrementally to segment multiple organs in a ct image, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part IV, Springer. pp. 714–724.
- Ccq: cross-class query network for partially labeled organ segmentation, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1755–1763.
- Improving ct-image universal lesion detection with comprehensive data and feature enhancements. Multimedia Systems , 1–12.
- Image segmentation using text and image prompts, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7086–7096.
- Word: Revisiting organs segmentation in the whole abdominal region. arXiv preprint arXiv:2111.02403 .
- Abdomenct-1k: Is abdominal organ segmentation a solved problem. IEEE Transactions on Pattern Analysis and Machine Intelligence .
- Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE transactions on medical imaging 39, 3257–3267.
- Universal lesion detection in ct scans using neural network ensembles, in: Medical Imaging 2022: Computer-Aided Diagnosis, SPIE. pp. 864–868.
- Incremental learning techniques for semantic segmentation, in: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp. 0–0.
- Checklist for artificial intelligence in medical imaging (claim): A guide for authors and reviewers. Radiology: Artificial Intelligence 2, e200029. URL: https://doi.org/10.1148/ryai.2020200029, doi:10.1148/ryai.2020200029, arXiv:https://doi.org/10.1148/ryai.2020200029. pMID: 33937821.
- 3d mri brain tumor segmentation using autoencoder regularization, in: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4, Springer. pp. 311--320.
- Universal lesion detection and classification using limited data and weakly-supervised self-training, in: Workshop on Medical Image Learning with Limited and Noisy Data, Springer. pp. 55--64.
- Minimum information about clinical artificial intelligence modeling: the mi-claim checklist. Nature medicine 26, 1320--1324.
- Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 .
- Multi-domain adaptation in brain mri through paired consistency and adversarial learning, in: Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data. Springer, pp. 54--62.
- Learn the new, keep the old: Extending pretrained models with new anatomy and images, in: Medical Image Computing and Computer Assisted Intervention--MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part IV 11, Springer. pp. 361--369.
- Extending pretrained segmentation networks with additional anatomical structures. International journal of computer assisted radiology and surgery 14, 1187--1195.
- Per-clip video object segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1352--1361.
- Medical image understanding with pretrained vision language models: A comprehensive study, in: The Eleventh International Conference on Learning Representations.
- Abdomenatlas-8k: Annotating 8,000 abdominal ct volumes for multi-organ segmentation in three weeks. Conference on Neural Information Processing Systems URL: https://github.com/MrGiovanni/AbdomenAtlas.
- Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, PMLR. pp. 8748--8763.
- Denseclip: Language-guided dense prediction with context-aware prompting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18082--18091.
- Ct-org, a new dataset for multiple organ segmentation in computed tomography. Scientific Data 7, 1--9.
- U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 234--241.
- Deeporgan: Multi-level deep convolutional networks for automated pancreas segmentation, in: International conference on medical image computing and computer-assisted intervention, Springer. pp. 556--564.
- Deep learning-enabled multi-organ segmentation in whole-body mouse scans. Nature communications 11, 5626.
- Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nature communications 12, 1--13.
- Marginal loss and exclusion loss for partially supervised multi-organ segmentation. Medical Image Analysis 70, 101979.
- Redundancy reduction in semantic segmentation of 3d brain tumor mris. arXiv preprint arXiv:2111.00742 .
- Towards foundation models and few-shot parameter-efficient fine-tuning for volumetric organ segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 213--224.
- A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 .
- 3d image reconstruction for comparison of algorithm database: A patient specific anatomical and medical image database. IRCAD, Strasbourg, France, Tech. Rep .
- Efficient 3d representation learning for medical image analysis. World Scientific Annual Review of Artificial Intelligence .
- Self-supervised pre-training of swin transformers for 3d medical image analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20730--20740.
- Multitalent: A multi-dataset approach to medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 648--658.
- Multi-modal learning from unpaired images: Application to multi-organ segmentation in ct and mri, in: 2018 IEEE winter conference on applications of computer vision (WACV), IEEE. pp. 547--556.
- Transbts: Multimodal brain tumor segmentation using transformer, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 109--119.
- Cris: Clip-driven referring image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11686--11695.
- Medclip: Contrastive learning from unpaired medical images and text, in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3876--3887.
- Totalsegmentator: Robust segmentation of 104 anatomic structures in ct images. Radiology: Artificial Intelligence 5.
- Totalsegmentator: robust segmentation of 104 anatomical structures in ct images. arXiv preprint arXiv:2208.05868 .
- Tgnet: A task-guided network architecture for multi-organ and tumour segmentation from partially labelled datasets, in: 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), IEEE. pp. 1--5.
- The felix project: Deep networks to detect pancreatic neoplasms. medRxiv .
- Clims: Cross language image matching for weakly supervised semantic segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4483--4492.
- Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation, in: Medical Image Computing and Computer Assisted Intervention--MICCAI 2021: 24th International Conference, Strasbourg, France, September 27--October 1, 2021, Proceedings, Part III 24, Springer. pp. 171--180.
- Learning from partially labeled data for multi-organ and tumor segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence .
- Unimiss: Universal medical self-supervised learning via breaking dimensionality barrier, in: European Conference on Computer Vision, Springer. pp. 558--575.
- Clinical-bert: Vision-language pre-training for radiograph diagnosis and reports generation, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2982--2990.
- Universal lesion detection by learning from multiple heterogeneously labeled datasets. arXiv preprint arXiv:2005.13753 .
- Mulan: multitask universal lesion analysis network for joint lesion detection, tagging, and segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 194--202.
- Mri manufacturer shift and adaptation: increasing the generalizability of deep learning segmentation for mr images acquired with different scanners. Radiology: Artificial Intelligence 2, e190195.
- Linkbert: Pretraining language models with document links. arXiv preprint arXiv:2203.15827 .
- Continual self-supervised learning: Towards universal multi-modal medical data representation learning. arXiv preprint arXiv:2311.17597 .
- Uniseg: A prompt-driven universal segmentation model as well as a strong representation learner. arXiv preprint arXiv:2304.03493 .
- C2fnas: Coarse-to-fine neural architecture search for 3d medical image segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4126--4135.
- Unest: Local spatial representation learning with hierarchical transformer for efficient medical segmentation. arXiv preprint arXiv:2209.14378 .
- Segment together: A versatile paradigm for semi-supervised medical image segmentation. arXiv preprint arXiv:2311.11686 .
- Spatially covariant image registration with text prompts. arXiv preprint arXiv:2311.15607 .
- Dodnet: Learning to segment multi-organ and tumors from multiple partially labeled datasets, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1195--1204.
- Leveraging ai predicted and expert revised annotations in interactive segmentation: Continual tuning or full training?, in: IEEE International Symposium on Biomedical Imaging, IEEE. URL: https://github.com/MrGiovanni/ContinualLearning.
- Merging nucleus datasets by correlation-based cross-training. Medical Image Analysis , 102705.
- Continual learning for abdominal multi-organ and tumor segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 35--45. URL: https://github.com/MrGiovanni/ContinualLearning.
- nnformer: Interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 .
- Prior-aware neural network for partially-supervised multi-organ segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10672--10681.
- Towards Annotation-Efficient Deep Learning for Computer-Aided Diagnosis. Ph.D. thesis. Arizona State University.
- Interpreting medical images, in: Intelligent Systems in Medicine and Health. Springer, pp. 343--371.
- Unet++: Redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging 39, 1856--1867. URL: https://github.com/MrGiovanni/UNetPlusPlus.
- Models genesis. Medical Image Analysis 67, 101840. URL: https://github.com/MrGiovanni/ModelsGenesis.
- Models genesis: Generic autodidactic models for 3d medical image analysis, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 384--393. URL: https://github.com/MrGiovanni/ModelsGenesis.
- Universal lesion detector: Deep learning for analysing medical scans .
- Jie Liu (492 papers)
- Yixiao Zhang (44 papers)
- Kang Wang (72 papers)
- Mehmet Can Yavuz (8 papers)
- Xiaoxi Chen (12 papers)
- Yixuan Yuan (67 papers)
- Haoliang Li (67 papers)
- Yang Yang (883 papers)
- Alan Yuille (294 papers)
- Yucheng Tang (67 papers)
- Zongwei Zhou (60 papers)