Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks (2312.07630v2)
Abstract: Recent advancements in foundation models, typically trained with self-supervised learning on large-scale and diverse datasets, have shown great potential in medical image analysis. However, due to the significant spatial heterogeneity of medical imaging data, current models must tailor specific structures for different datasets, making it challenging to leverage the abundant unlabeled data. In this work, we propose a universal foundation model for medical image analysis that processes images with heterogeneous spatial properties using a unified structure. To accomplish this, we propose spatially adaptive networks (SPAD-Nets), a family of networks that dynamically adjust the structures to adapt to the spatial properties of input images, to build such a universal foundation model. We pre-train a spatial adaptive visual tokenizer (SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via masked image modeling (MIM) on 55 public medical image datasets. The pre-training data comprises over 9 million image slices, representing the largest, most comprehensive, and most diverse dataset to our knowledge for pre-training universal foundation models for medical image analysis. The experimental results on downstream medical image classification and segmentation tasks demonstrate the superior performance and label efficiency of our model. Our code is available at https://github.com/function2-llx/PUMIT.
- Prostate158 - An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection. Computers in Biology and Medicine, 148: 105817.
- The Brain Tumor Segmentation (BraTS) Challenge 2023: Glioma Segmentation in Sub-Saharan Africa Patient Population (BraTS-Africa). arXiv:2305.19369.
- Dataset of breast ultrasound images. Data in Brief, 28: 104863.
- The Medical Segmentation Decathlon. Nature Communications, 13(1).
- The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans. Medical Physics, 38(2): 915–931.
- Data From LIDC-IDRI.
- Neural Segmentation of Seeding ROIs (sROIs) for Pre-Surgical Brain Tractography. IEEE transactions on medical imaging, 39(5): 1655–1667.
- Big Self-Supervised Models Advance Medical Image Classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 3478–3488.
- Efficient self-supervised learning with contextualized target representations for vision, speech and language. In International Conference on Machine Learning, 1416–1429. PMLR.
- Data2vec: A general framework for self-supervised learning in speech, vision and language. In International Conference on Machine Learning, 1298–1312. PMLR.
- The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification. arXiv:2107.02314.
- Segmentation Labels for the Pre-operative Scans of the TCGA-GBM collection.
- Segmentation Labels for the Pre-operative Scans of the TCGA-LGG collection.
- Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific Data, 4(1).
- BEiT: BERT Pre-Training of Image Transformers. In International Conference on Learning Representations.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.
- Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? IEEE Transactions on Medical Imaging, 37(11): 2514–2525.
- The Liver Tumor Segmentation Benchmark (LiTS). CoRR, abs/1901.04056.
- Data From PROSTATE-DIAGNOSIS.
- Automatic multiscale vascular image segmentation algorithm for coronary angiography. Biomedical Signal Processing and Control, 46: 1–9.
- MONAI: An open-source framework for deep learning in healthcare. arXiv:2211.02701.
- Self-supervised learning for medical image analysis using image context restoration. Medical image analysis, 58: 101539.
- Masked image modeling advances 3d medical image analysis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1970–1980.
- A scalable physician-level deep learning algorithm detects universal trauma on pelvic radiographs. Nature Communications, 12(1).
- Data From PROSTATE-MRI.
- The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository. Journal of Digital Imaging, 26(6): 1045–1057.
- The RSNA Pulmonary Embolism CT Dataset. Radiology: Artificial Intelligence, 3(2): e200254.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
- CrossMoDA 2021 challenge: Benchmark of cross-modality domain adaptation techniques for vestibular schwannoma and cochlea segmentation. Medical Image Analysis, 83: 102628.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
- Eva-02: A visual representation for neon genesis. arXiv preprint arXiv:2303.11331.
- EVA: Exploring the Limits of Masked Visual Representation Learning at Scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19358–19369.
- RSNA 2022 Cervical Spine Fracture Detection.
- An Ensemble Classification-Based Approach Applied to Retinal Blood Vessel Segmentation. IEEE Transactions on Biomedical Engineering, 59(9): 2538–2548.
- Masked Autoencoders Are Scalable Vision Learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15979–15988.
- Geometric Visual Similarity Learning in 3D Medical Image Self-supervised Pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9538–9547.
- CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01): 590–597.
- nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods, 18(2): 203–211.
- Categorical Reparameterization with Gumbel-Softmax. In International Conference on Learning Representations.
- AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems, volume 35, 36722–36732. Curran Associates, Inc.
- Self-supervised 3D Anatomy Segmentation Using Self-distilled Masked Image Transformer (SMIT). In Wang, L.; Dou, Q.; Fletcher, P. T.; Speidel, S.; and Li, S., eds., Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, 556–566. Cham: Springer Nature Switzerland. ISBN 978-3-031-16440-8.
- Deep-learning-assisted detection and segmentation of rib fractures from CT scans: Development and validation of FracNet. EBioMedicine, 62: 103106.
- Deep-learning-assisted detection and segmentation of rib fractures from CT scans: Development and validation of FracNet. eBioMedicine, 62: 103106.
- Accuracy of CT Colonography for Detection of Large Adenomas and Cancers. New England Journal of Medicine, 359(12): 1207–1217.
- Federated benchmarking of medical artificial intelligence with MedPerf. Nature Machine Intelligence, 5(7): 799–810.
- CHAOS Challenge - combined (CT-MR) healthy abdominal organ segmentation. Medical Image Analysis, 69: 101950.
- Comparison of semi-automatic and deep learning based automatic methods for liver segmentation in living liver transplant donors. Diagnostic and Interventional Radiology, 26: 11–21.
- CHAOS - Combined (CT-MR) Healthy Abdominal Organ Segmentation Challenge Data.
- The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs). arXiv:2305.17033.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Chákṣu: A glaucoma specific fundus image database. Scientific Data, 10(1).
- The ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge 2023: Intracranial Meningioma. arXiv:2305.07642.
- Miccai multi-atlas labeling beyond the cranial vault–workshop and challenge. In Proc. MICCAI Multi-Atlas Labeling Beyond Cranial Vault—Workshop Challenge, volume 5, 12.
- A Computed Tomography Vertebral Segmentation Dataset with Anatomical Variations and Multi-Vendor Scanner Data. arXiv:2103.06360.
- A survey on deep learning in medical image analysis. Medical Image Analysis, 42: 60–88.
- Perception consistency ultrasound image super-resolution via self-supervised CycleGAN. Neural Computing and Applications, 1–11.
- A Vertebral Segmentation Dataset with Fracture Grading. Radiology: Artificial Intelligence, 2(4): e190138.
- Decoupled Weight Decay Regularization. In International Conference on Learning Representations.
- The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. In International Conference on Learning Representations.
- The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging, 34(10): 1993–2024.
- The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI. arXiv:2306.00838.
- Automatic Segmentation of White Matter Tracts Using Multiple Brain MRI Sequences. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 368–371. IEEE.
- LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching. arXiv preprint arXiv:2306.11925.
- Joint Self-Supervised Image-Volume Representation Learning with Intra-inter Contrastive Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 37(12): 14426–14435.
- Retinal Fundus Multi-Disease Image Dataset (RFMiD): A Dataset for Multi-Disease Detection Research. Data, 6(2).
- DGMSNet: Spine segmentation for MR image by a detection-guided mixed-supervised segmentation network. Medical Image Analysis, 75: 102261.
- SpineParseNet: Spine Parsing for Volumetric MR Image by a Two-Stage Segmentation Framework With Semantic Image Representation. IEEE Transactions on Medical Imaging, 40(1): 262–273.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers. arXiv:2208.06366.
- HaN-Seg: The head and neck organ-at-risk CT and MR segmentation dataset. Medical Physics, 50(3): 1917–1927.
- Indian Diabetic Retinopathy Image Dataset (IDRiD).
- Indian Diabetic Retinopathy Image Dataset (IDRiD): A Database for Diabetic Retinopathy Screening Research. Data, 3(3).
- EPISURG: a dataset of postoperative magnetic resonance images (MRI) for quantitative analysis of resection neurosurgery for refractory epilepsy.
- Study of Thoracic CT in COVID-19: The STOIC Project. Radiology, 301(1): E361–E370.
- Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI-CAI Challenge (Study Protocol).
- VerSe: A Vertebrae labelling and segmentation benchmark for multi-detector CT images. Medical Image Analysis, 73: 102166.
- Segmentation of Vestibular Schwannoma from Magnetic Resonance Imaging: An Open Annotated Dataset and Baseline Algorithm (Vestibular-Schwannoma-SEG).
- Segmentation of vestibular schwannoma from MRI, an open annotated dataset and baseline algorithm. Scientific Data, 8(1).
- Self-supervised learning methods and applications in medical imaging analysis: a survey. PeerJ Computer Science, 8: e1045.
- A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv:1902.09063.
- Data From CT COLONOGRAPHY (ACRIN 6664).
- MoCo-CXR: MoCo pretraining improves representation and transferability of chest X-ray models, 2021. URL https://arxiv. org/abs.
- RoFormer: Enhanced Transformer with Rotary Position Embedding. arXiv e-prints, arXiv:2104.09864.
- Pre-Training Auto-Generated Volumetric Shapes for 3D Medical Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4739–4744.
- Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20730–20740.
- Automated measurement of fetal head circumference.
- Automated measurement of fetal head circumference using 2D ultrasound images. PLOS ONE, 13(8): 1–20.
- Neural Discrete Representation Learning. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Attention is All you Need. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3462–3471.
- TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiology: Artificial Intelligence, 5(5).
- Masked Feature Prediction for Self-Supervised Visual Pre-Training. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14648–14658.
- UniMiSS: Universal Medical Self-supervised Learning via Breaking Dimensionality Barrier. In Avidan, S.; Brostow, G.; Cissé, M.; Farinella, G. M.; and Hassner, T., eds., Computer Vision – ECCV 2022, 558–575. Cham: Springer Nature Switzerland. ISBN 978-3-031-19803-8.
- SimMIM: a Simple Framework for Masked Image Modeling. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9643–9653.
- On data scaling in masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10365–10374.
- Efficient Multiple Organ Localization in CT Image Using 3D Region Proposal Network. IEEE Transactions on Medical Imaging, 38(8): 1885–1898.
- MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis. In IEEE 18th International Symposium on Biomedical Imaging (ISBI), 191–195.
- MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification. Scientific Data, 10(1): 41.
- IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning. In CVPR.
- Regularized Vector Quantization for Tokenized Image Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18467–18476.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Adapting Pre-trained Vision Transformers from 2D to 3D through Weight Inflation Improves Medical Image Segmentation. In Parziale, A.; Agrawal, M.; Joshi, S.; Chen, I. Y.; Tang, S.; Oala, L.; and Subbaswamy, A., eds., Proceedings of the 2nd Machine Learning for Health symposium, volume 193 of Proceedings of Machine Learning Research, 391–404. PMLR.
- A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Preservational learning improves self-supervised medical image models by reconstructing diverse contexts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3499–3509.
- Image BERT Pre-training with Online Tokenizer. In International Conference on Learning Representations.
- Models genesis. Medical image analysis, 67: 101840.
- Rubik’s cube+: A self-supervised feature learning framework for 3d medical image analysis. Medical image analysis, 64: 101746.