Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography (2403.17834v2)

Published 26 Mar 2024 in cs.CV
Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography

Abstract: While computer vision has achieved tremendous success with multimodal encoding and direct textual interaction with images via chat-based LLMs, similar advancements in medical imaging AI, particularly in 3D imaging, have been limited due to the scarcity of comprehensive datasets. To address this critical gap, we introduce CT-RATE, the first dataset that pairs 3D medical images with corresponding textual reports. CT-RATE comprises 25,692 non-contrast 3D chest CT scans from 21,304 unique patients. Through various reconstructions, these scans are expanded to 50,188 volumes, totaling over 14.3 million 2D slices. Each scan is accompanied by its corresponding radiology report. Leveraging CT-RATE, we develop CT-CLIP, a CT-focused contrastive language-image pretraining framework designed for broad applications without the need for task-specific training. We demonstrate how CT-CLIP can be used in two tasks: multi-abnormality detection and case retrieval. Remarkably, in multi-abnormality detection, CT-CLIP outperforms state-of-the-art fully supervised models across all key metrics, effectively eliminating the need for manual annotation. In case retrieval, it efficiently retrieves relevant cases using either image or textual queries, thereby enhancing knowledge dissemination. By combining CT-CLIP's vision encoder with a pretrained LLM, we create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes. Finetuned on over 2.7 million question-answer pairs derived from the CT-RATE dataset, CT-CHAT surpasses other multimodal AI assistants, underscoring the necessity for specialized methods in 3D medical imaging. Collectively, the open-source release of CT-RATE, CT-CLIP, and CT-CHAT not only addresses critical challenges in 3D medical imaging but also lays the groundwork for future innovations in medical AI and improved patient care.

Leveraging CT-RATE and CT-CLIP for Advanced Multi-Abnormality Detection in Chest CT Volumes

Introduction to CT-RATE and CT-CLIP

The paper introduces CT-RATE, a pioneering dataset pairing non-contrast chest CT volumes with corresponding radiological text reports, and CT-CLIP, a contrastive language-image pre-training framework optimized for this dataset. CT-RATE encompasses 25,692 CT volumes (expanded to 50,188 through reconstructions) from 21,304 unique patients. CT-CLIP, leveraging the dataset, establishes a new benchmark in multi-abnormality detection in chest CT scans by outperforming fully supervised methods without necessitating manual annotation.

CT-RATE Dataset: A Novel Resource

CT-RATE stands as the first comprehensive 3D medical imaging dataset that merges images with textual radiology reports. The significance of this dataset lies in its ability to facilitate the training of more sophisticated models capable of understanding the complex interplay between visual features and textual descriptions in medical imaging. This advancement addresses a critical gap in available datasets for computational research in 3D medical imaging.

CT-CLIP: Setting New Standards

CT-CLIP, developed on the CT-RATE dataset, demonstrates remarkable capabilities in zero-shot multi-abnormality detection. The model achieves superior performance across all key metrics when compared to state-of-the-art, fully supervised methods. Its achievements can be summarized as follows:

  • Outperforms fully supervised approaches in multi-abnormality detection without requiring task-specific training.
  • Shows utility in case retrieval for both imagery and textual queries, thus promoting a more efficient dissemination of medical knowledge.
  • The open-source availability of CT-CLIP and the CT-RATE dataset is poised to significantly advance medical AI by enhancing the analysis of 3D imaging and fostering innovation in healthcare applications.

Implications and Future Directions

The development of CT-CLIP and the presentation of the CT-RATE dataset have several important implications:

  • Reduction of Manual Annotation Effort: The ability of CT-CLIP to exceed the performance of supervised methods in detecting multiple abnormalities demonstrates a critical step towards reducing the reliance on labor-intensive manual annotations in medical imaging.
  • Advancement in Case Retrieval: The efficacy of CT-CLIP in retrieving cases using both image-based and text-based queries can significantly expedite the review of relevant past cases, potentially improving diagnostic accuracy and patient care.
  • Foundation for Future Research: The release of CT-RATE is set to catalyze further research in medical imaging analysis. Future work could explore extending CT-CLIP's capabilities to other imaging modalities or to more granular abnormality detection and classification tasks.

Conclusion

The introduction of CT-RATE and CT-CLIP represents a significant step forward in the computational analysis of medical imaging, specifically in the context of chest CT scans. By achieving unprecedented performance in multi-abnormality detection and facilitating efficient case retrieval, this work sets a new benchmark in the field. Looking ahead, the potential applications of this research in improving diagnostic workflows and patient outcomes are vast, with the open-source release ensuring broad accessibility and encouraging continued innovation in medical AI research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. A. Esteva, K. Chou, S. Yeung, N. V. Naik, A. Madani, A. Mottaghi, Y. Liu, E. J. Topol, J. Dean, and R. Socher, “Deep learning-enabled medical computer vision,” NPJ Digital Medicine, vol. 4, 2021.
  2. C. Qin, D. Yao, Y. Shi, and Z. Song, “Computer-aided detection in chest radiography based on artificial intelligence: a survey,” BioMedical Engineering OnLine, vol. 17, 2018.
  3. J. Lipková, T. Y. Chen, M. Y. Lu, R. J. Chen, M. Shady, M. Williams, J. Wang, Z. Noor, R. N. Mitchell, M. Turan, G. Coskun, F. Yilmaz, D. Demir, D. Nart, K. Başak, N. Turhan, S. Ozkara, Y. Banz, K. E. Odening, and F. Mahmood, “Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies,” Nature Medicine, vol. 28, pp. 575 – 582, 2022.
  4. I. E. Hamamci, S. Er, E. Simsar, A. K. Sekuboyina, M. Gundogar, B. Stadlinger, A. C. Mehl, and B. H. Menze, “Diffusion-based hierarchical multi-label object detection to analyze panoramic dental x-rays,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2023.
  5. S. Pati, S. P. Thakur, M. M. Bhalerao, U. Baid, C. M. Grenko, B. Edwards, M. J. Sheller, J. L. Agraz, B. Baheti, V. Bashyam, P. Sharma, B. Haghighi, A. Gastounioti, M. Bergman, B. H. Menze, D. Kontos, C. Davatzikos, and S. Bakas, “Gandlf: the generally nuanced deep learning framework for scalable end-to-end clinical workflows,” Communications Engineering, vol. 2, no. 1, p. 23, 2023.
  6. A. E. W. Johnson, T. J. Pollard, S. J. Berkowitz, N. R. Greenbaum, M. P. Lungren, C. ying Deng, R. G. Mark, and S. Horng, “Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports,” Scientific Data, vol. 6, 2019.
  7. X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3462–3471, 2017.
  8. H. Q. Nguyen, K. Lam, L. T. Le, H. Pham, D. Q. Tran, D. B. Nguyen, D. D. Le, C. M. Pham, H. Tong, D. H. Dinh, C. D. Do, L. T. Doan, C. N. Nguyen, B. T. Nguyen, Q. V. Nguyen, A. D. Hoang, H. N. Phan, A. T. Nguyen, P. Ho, D. T. Ngo, N. T. Nguyen, N. T. Nguyen, M.-S. Dao, and V. Vu, “Vindr-cxr: An open dataset of chest x-rays with radiologist’s annotations,” Scientific Data, vol. 9, 2020.
  9. J. A. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. L. Ball, K. S. Shpanskaya, J. Seekins, D. A. Mong, S. S. Halabi, J. K. Sandberg, R. Jones, D. B. Larson, C. Langlotz, B. N. Patel, M. P. Lungren, and A. Ng, “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison,” in AAAI Conference on Artificial Intelligence, 2019.
  10. I. E. Hamamci, S. Er, E. Simsar, A. E. Yuksel, S. Gultekin, S. Ozdemir, K. Yang, H. Li, S. Pati, B. Stadlinger, A. C. Mehl, M. Gundogar, and B. H. Menze, “Dentex: An abnormal tooth detection with dental enumeration and diagnosis benchmark for panoramic x-rays,” ArXiv, vol. abs/2305.19112, 2023.
  11. E. Tiu, E. Talius, P. Patel, C. P. Langlotz, A. Y. Ng, and P. Rajpurkar, “Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning,” Nature Biomedical Engineering, vol. 6, pp. 1399 – 1406, 2022.
  12. M. Y. Lu, B. Chen, D. F. K. Williamson, R. J. Chen, I. Liang, T. Ding, G. Jaume, I. Odintsov, L. P. Le, G. Gerber, A. V. Parwani, A. Zhang, and F. Mahmood, “A visual-language foundation model for computational pathology.,” Nature Medicine, 2024.
  13. X. Chen, X. Wang, K. Zhang, R. Zhang, K.-M. Fung, T. C. Thai, K. M. Moore, R. S. Mannel, H. Liu, B. Zheng, and Y. Qiu, “Recent advances and clinical applications of deep learning in medical image analysis,” Medical Image Analysis, vol. 79, p. 102444, 2021.
  14. S. P. Singh, L. Wang, S. Gupta, H. Goli, P. Padmanabhan, and B. Guly’as, “3d deep learning on medical images: a review,” Sensors (Basel, Switzerland), vol. 20, 2020.
  15. S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. van Ginneken, A. Madabhushi, J. L. Prince, D. Rueckert, and R. M. Summers, “A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises,” Proceedings of the IEEE, vol. 109, pp. 820–838, 2020.
  16. M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, H. Harvey, L. R. Folio, R. M. Summers, D. Rubin, and M. P. Lungren, “Preparing medical imaging data for machine learning,” Radiology, p. 192224, 2020.
  17. I. E. Hamamci, S. Er, E. Simsar, A. Tezcan, A. Simsek, F. Almas, S. N. Esirgun, H. Reynaud, S. Pati, C. Blüthgen, and B. H. Menze, “Generatect: Text-guided 3d chest ct generation,” ArXiv, vol. abs/2305.16037, 2023.
  18. J. Gao, T. Shen, Z. Wang, W. Chen, K. Yin, D. Li, O. Litany, Z. Gojcic, and S. Fidler, “Get3d: A generative model of high quality 3d textured shapes learned from images,” ArXiv, vol. abs/2209.11163, 2022.
  19. R. L. Draelos, D. Dov, M. A. Mazurowski, J. Y. Lo, R. Henao, G. D. Rubin, and L. Carin, “Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes,” Medical Image Analysis, vol. 67, p. 101857, 2020.
  20. T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in ArXiv, vol. abs/2002.05709, 2020.
  21. Y. Xian, B. Schiele, and Z. Akata, “Zero-shot learning-the good, the bad and the ugly,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3077–3086, 2017.
  22. Y. Zhang, H. Jiang, Y. Miura, C. D. Manning, and C. Langlotz, “Contrastive learning of medical visual representations from paired images and text,” in Machine Learning in Health Care, 2020.
  23. M. J. Willemink and P. B. Noël, “The evolution of image reconstruction for ct—from filtered back projection to artificial intelligence,” European Radiology, vol. 29, pp. 2185 – 2195, 2018.
  24. C. Sager, C. Janiesch, and P. Zschech, “A survey of image labelling for computer vision applications,” Journal of Business Analytics, vol. 4, pp. 91 – 110, 2021.
  25. A. Yan, J. McAuley, X. Lu, J. Du, E. Y. Chang, A. Gentili, and C.-N. Hsu, “Radbert: Adapting transformer-based language models to radiology,” Radiology. Artificial intelligence, vol. 4 4, p. e210258, 2022.
  26. S. Minaee, E. Cambria, and J. Gao, “Deep learning-based text classification: A comprehensive review,” ACM Computing Surveys, vol. 54, pp. 1 – 40, 2020.
  27. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning, 2021.
  28. H. Zhang, J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Cross-modal contrastive learning for text-to-image generation,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 833–842, 2021.
  29. A. Aljuaid and M. Anwar, “Survey of supervised learning for medical image processing,” Sn Computer Science, vol. 3, 2022.
  30. G. Yong, K. Jeon, D. Gil, and G. Lee, “Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model,” Computer‐Aided Civil and Infrastructure Engineering, vol. 38, pp. 1536 – 1554, 2022.
  31. A. Bailly, C. Blanc, É. Francis, T. Guillotin, F. Jamal, B. Wakim, and P. Roy, “Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models,” Computer Methods and Programs in Biomedicine, vol. 213, p. 106504, 2022.
  32. X. Zhai, A. Kolesnikov, N. Houlsby, and L. Beyer, “Scaling vision transformers,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1204–1213, 2021.
  33. M. Wortsman, G. Ilharco, M. Li, J. W. Kim, H. Hajishirzi, A. Farhadi, H. Namkoong, and L. Schmidt, “Robust fine-tuning of zero-shot models,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7949–7961, 2021.
  34. C. Chen, M. Y. Lu, D. F. K. Williamson, T. Y. Chen, A. J. Schaumberg, and F. Mahmood, “Fast and scalable search of whole-slide images via self-supervised deep learning,” Nature Biomedical Engineering, vol. 6, pp. 1420 – 1434, 2022.
  35. S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas, “Query specific fusion for image retrieval,” in European Conference on Computer Vision, 2012.
  36. J. Li, G. Zhu, C. Hua, M. Feng, B. Bennamoun, P. Li, X. Lu, J. Song, P. Shen, X. Xu, L. Mei, L. Zhang, S. A. A. Shah, and Bennamoun, “A systematic collection of medical image datasets for deep learning,” ACM Computing Surveys, 2021.
  37. A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris, “Deep learning advances in computer vision with 3d data: A survey,” ACM Computing Surveys, vol. 50, pp. 1 – 38, 2017.
  38. S. Wang, C. Li, R. Wang, Z. Liu, M. Wang, H. Tan, Y. Wu, X. Liu, H. Sun, R. Yang, X. Liu, J. Chen, H.-C. Zhou, I. B. Ayed, and H. Zheng, “Annotation-efficient deep learning for automatic medical image segmentation,” Nature Communications, vol. 12, 2020.
  39. M. P. Hartung, I. C. Bickle, F. Gaillard, and J. P. Kanne, “How to create a great radiology report,” RadioGraphics, vol. 40, no. 6, pp. 1658–1670, 2020.
  40. C. J. Kelly, A. Karthikesalingam, M. Suleyman, G. C. Corrado, and D. King, “Key challenges for delivering clinical impact with artificial intelligence,” BMC Medicine, vol. 17, 2019.
  41. B. Boecking, N. Usuyama, S. Bannur, D. C. de Castro, A. Schwaighofer, S. L. Hyland, M. T. Wetscherek, T. Naumann, A. Nori, J. Alvarez-Valle, H. Poon, and O. Oktay, “Making the most of text semantics to improve biomedical vision-language processing,” in European Conference on Computer Vision, 2022.
  42. T. D. DenOtter and J. Schubert, “Hounsfield unit,” 2019.
  43. Z. Huang, F. Bianchi, M. Yuksekgonul, T. J. Montine, and J. Y. Zou, “A visual–language foundation model for pathology image analysis using medical twitter,” Nature Medicine, vol. 29, pp. 2307–2316, 2023.
  44. Z. Zhang and M. R. Sabuncu, “Generalized cross entropy loss for training deep neural networks with noisy labels,” ArXiv, vol. abs/1805.07836, 2018.
  45. G. Alain and Y. Bengio, “Understanding intermediate layers using linear classifier probes,” ArXiv, vol. abs/1610.01644, 2016.
  46. L. van der Maaten and G. E. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.
  47. R. J. Chen, T. Ding, M. Y. Lu, D. F. K. Williamson, G. Jaume, A. H. Song, B. Chen, A. Zhang, D. Shao, M. Shaban, M. Williams, L. Oldenburg, L. L. Weishaupt, J. J. Wang, A. Vaidya, L. P. Le, G. Gerber, S. Sahai, W. Williams, and F. Mahmood, “Towards a general-purpose foundation model for computational pathology.,” Nature Medicine, 2024.
  48. M. L. Kherfi, D. Ziou, and A. Bernardi, “Image retrieval from the world wide web: Issues, techniques, and systems,” ACM Computing Surveys, vol. 36, pp. 35–67, 2004.
  49. N. Hegde, J. D. Hipp, Y. Liu, M. R. Emmert-Buck, E. Reif, D. Smilkov, M. Terry, C. J. Cai, M. B. Amin, C. H. Mermel, P. Q. Nelson, L. H. Peng, G. S. Corrado, and M. C. Stumpe, “Similar image search for histopathology: Smily,” NPJ Digital Medicine, vol. 2, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (19)
  1. Ibrahim Ethem Hamamci (11 papers)
  2. Sezgin Er (8 papers)
  3. Furkan Almas (2 papers)
  4. Ayse Gulnihan Simsek (2 papers)
  5. Sevval Nil Esirgun (2 papers)
  6. Muhammed Furkan Dasdelen (2 papers)
  7. Bastian Wittmann (7 papers)
  8. Enis Simsar (20 papers)
  9. Mehmet Simsar (1 paper)
  10. Emine Bensu Erdemir (1 paper)
  11. Abdullah Alanbay (1 paper)
  12. Anjany Sekuboyina (32 papers)
  13. Berkan Lafci (3 papers)
  14. Bjoern Menze (116 papers)
  15. Irem Dogan (2 papers)
  16. Omer Faruk Durugol (1 paper)
  17. Tamaz Amiranashvili (12 papers)
  18. Christian Bluethgen (20 papers)
  19. Mehmet Kemal Ozdemir (6 papers)
Citations (17)