Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLIP in Medical Imaging: A Comprehensive Survey (2312.07353v5)

Published 12 Dec 2023 in cs.CV

Abstract: Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training paradigm, successfully introduces text supervision to vision models. It has shown promising results across various tasks, attributable to its generalizability and interpretability. The use of CLIP has recently gained increasing interest in the medical imaging domain, serving both as a pre-training paradigm for aligning medical vision and language, and as a critical component in diverse clinical tasks. With the aim of facilitating a deeper understanding of this promising direction, this survey offers an in-depth exploration of the CLIP paradigm within the domain of medical imaging, regarding both refined CLIP pre-training and CLIP-driven applications. In this study, We (1) start with a brief introduction to the fundamentals of CLIP methodology. (2) Then, we investigate the adaptation of CLIP pre-training in the medical domain, focusing on how to optimize CLIP given characteristics of medical images and reports. (3) Furthermore, we explore the practical utilization of CLIP pre-trained models in various tasks, including classification, dense prediction, and cross-modal tasks. (4) Finally, we discuss existing limitations of CLIP in the context of medical imaging and propose forward-looking directions to address the demands of medical imaging domain. We expect that this comprehensive survey will provide researchers in the field of medical image analysis with a holistic understanding of the CLIP paradigm and its potential implications. The project page can be found on https://github.com/zhaozh10/Awesome-CLIP-in-Medical-Imaging.

An Expert Examination of "CLIP in Medical Imaging: A Comprehensive Survey"

The paper entitled "CLIP in Medical Imaging: A Comprehensive Survey" provides an exhaustive analysis of the Contrastive Language-Image Pre-training (CLIP) paradigm within the context of medical imaging, highlighting its implications as well as its current challenges and prospects for future research. Authored by Zhao et al., this survey distills the complexities of applying CLIP to medical images, enriching the discussion with deeper insights into how text and visual modalities can together advance the state-of-the-art in medical image analysis.

CLIP's core advantage lies in its powerful pre-training method which aligns images and texts through a shared latent space, enabling robust zero-shot performance across diverse downstream tasks. This possibility propels its use in medical imaging, where there is an abundance of text-rich annotations and reports. The survey categorizes its analysis into several parts: the fundamentals of CLIP, its adaptation for medical images, its usage across different tasks, and the forward-looking challenges that lie ahead.

Key Challenges and Adaptations: The paper identifies three primary challenges in adapting CLIP to the field of medical imaging: the necessity for multi-scale feature extraction, the relative scarcity of available paired datasets, and the need to infuse models with medical domain-specific knowledge. These challenges are non-trivial because medical images often exhibit finer details relevant for diagnosis, making high-level semantic alignments less effective unless supplemented with finer-scale awareness. Additionally, the authors acknowledge the limitation posed by the scarcity of large, labeled medical datasets, emphasizing the importance of innovative data-efficient learning techniques.

Several refined strategies for CLIP pre-training in medical imaging are explored, including multi-scale contrasts, correlation-driven contrastive mechanisms, and explicit incorporation of medical knowledge. Collectively, these approaches push the boundaries of CLIP's utility beyond its initial design, aiming to enhance both the breadth and depth of feature representations.

Applications and Tasks: The paper highlights CLIP's versatility through its integration in various tasks such as classification, segmentation, detection, and cross-modality applications. Specifically, zero-shot classification exemplifies the potential of CLIP in deploying diagnostic systems without extensive retraining on domain-specific data. In segmentation and detection, CLIP's ability to finely localize anomalies or regions of interest showcases its compatibility with pixel-level tasks, thereby extending its utility in facilitating more automated and detailed interpretations of medical images.

Future Directions: The authors elaborate on prospective challenges and avenues for improvement. Among these are harmonizing pre-training paradigms with specific clinical applications to yield more robust models and emphasizing the holistic evaluation of both image and text encoders for assurance in applied settings. They also stress the importance of extending CLIP's pre-training framework to domains beyond chest imaging, thus broadening its impact across medical modalities.

Conclusion: Through an erudite discussion, this paper underscores CLIP's potential to revolutionize medical imaging by harnessing the power of visual and textual data fusion. While highlighting significant strides already taken, it sets the stage for further innovation to overcome existing barriers, pushing for more sophisticated, knowledge-enhanced models that are adaptable across diverse healthcare applications. The insights provided establish a fertile ground for continued exploration and development in this rapidly evolving domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (202)
  1. Synthetic boost: Leveraging synthetic data for enhanced vision-language segmentation in echocardiography, in: International Workshop on Advances in Simplifying Medical Ultrasound, Springer. pp. 89–99.
  2. Publicly available clinical, in: Proceedings of the 2nd Clinical Natural Language Processing Workshop, Association for Computational Linguistics.
  3. One-shot localization and segmentation of medical images with foundation models. arXiv preprint arXiv:2310.18642 .
  4. Anonymous, 2023. Cascaded contrastive medical language-image pretraining on radiology images, in: Submitted to The Twelfth International Conference on Learning Representations. URL: https://openreview.net/forum?id=BRTyPCq4wL. under review.
  5. The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics 38, 915–931.
  6. Retrieval-based language models and applications, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts), pp. 41–46.
  7. EHRXQA: A multi-modal question answering dataset for electronic health records with chest x-ray images, in: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. URL: https://openreview.net/forum?id=Pk2x7FPuZ4.
  8. Exploring the transfer learning capabilities of clip in domain generalization for diabetic retinopathy, in: International Workshop on Machine Learning in Medical Imaging, Springer. pp. 444–453.
  9. Bridging the gap between object and image-level representations for open-vocabulary detection. Advances in Neural Information Processing Systems 35, 33781–33794.
  10. Learning to exploit temporal structure for biomedical vision-language processing, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15016–15027.
  11. nndetection: a self-configuring method for medical object detection, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24, Springer. pp. 530–539.
  12. Scibert: A pretrained language model for scientific text, in: EMNLP, Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/D19-1371.
  13. The liver tumor segmentation benchmark (lits). Medical Image Analysis 84, 102680.
  14. Occam’s razor. Information processing letters 24, 377–380.
  15. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32, D267–D270.
  16. Making the most of text semantics to improve biomedical vision–language processing, in: European conference on computer vision, Springer. pp. 1–21.
  17. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26.
  18. Padchest: A large chest x-ray image dataset with multi-label annotated reports. Medical image analysis 66, 101797.
  19. Few-shot medical image classification with simple shape and texture text descriptors using vision-language models. arXiv preprint arXiv:2308.04005 .
  20. Domain-controlled prompt learning. arXiv preprint arXiv:2310.07730 .
  21. Attri-vae: Attribute-based interpretable representations of medical images with variational autoencoders. Computerized Medical Imaging and Graphics 104, 102158.
  22. A zero-/few-shot anomaly classification and segmentation method for cvpr 2023 vand workshop challenge tracks 1&2: 1st place on zero-shot ad and 4th place on few-shot ad. arXiv preprint arXiv:2305.17382 .
  23. High-order resting-state functional connectivity network for mci classification. Human brain mapping 37, 3282–3296.
  24. Surgical video captioning with mutual-modal concept alignment, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 24–34.
  25. Align, reason and learn: Enhancing medical vision-and-language pre-training with knowledge, in: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5152–5161.
  26. Prior: Prototype representation joint learning from medical images and reports, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21361–21371.
  27. Reproducible scaling laws for contrastive language-image learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2818–2829.
  28. Limitr: Leveraging local information for medical image-text representation. arXiv preprint arXiv:2303.11755 .
  29. Feedback on a publicly distributed image database: the messidor database. Image Analysis & Stereology 33, 231–234.
  30. Pathology-and-genomics multimodal transformer for survival outcome prediction, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 622–631.
  31. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 .
  32. Diabetic retinopathy detection. URL: https://kaggle.com/competitions/diabetic-retinopathy-detection.
  33. Pubmedclip: How much does clip benefit visual question answering in the medical domain?, in: Findings of the Association for Computational Linguistics: EACL 2023, pp. 1151–1163.
  34. Does clip benefit visual question answering in the medical domain as much as it does in the general domain? arXiv preprint arXiv:2112.13906 .
  35. Shortcut learning in deep neural networks. Nature Machine Intelligence 2, 665–673.
  36. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH) 3, 1–23.
  37. Multiple prompt fusion for zero-shot lesion detection using vision-language models, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 283–292.
  38. Multi-layer pseudo-supervision for histopathology tissue semantic segmentation using patch-level classification labels. Medical Image Analysis 80, 102487.
  39. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
  40. The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes. arXiv preprint arXiv:1904.00445 .
  41. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3942–3951.
  42. A visual–language foundation model for pathology image analysis using medical twitter. Nature medicine 29, 2307–2316.
  43. Kiut: Knowledge-injected u-transformer for radiology report generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19809–19818.
  44. Taming detection transformers for medical object detection, in: BVM Workshop, Springer. pp. 183–188.
  45. Quilt-1m: One million image-text pairs for histopathology. arXiv preprint arXiv:2306.11207 .
  46. Overview of the imageclef 2023: Multimedia retrieval in medical, social media and internet applications, in: International Conference of the Cross-Language Evaluation Forum for European Languages, Springer. pp. 370–396.
  47. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison, in: Proceedings of the AAAI conference on artificial intelligence, pp. 590–597.
  48. Radgraph: Extracting clinical entities and relations from radiology reports, in: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). URL: https://openreview.net/forum?id=pMWtc5NKd7V.
  49. Significantly improving zero-shot x-ray pathology classification via fine-tuning pre-trained image-text encoders. arXiv preprint arXiv:2212.07050 .
  50. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data 6, 317.
  51. Creation and validation of a chest x-ray dataset with eye-tracking and report dictation for ai development. Scientific data 8, 1–18.
  52. Aptos 2019 blindness detection. URL: https://kaggle.com/competitions/aptos2019-blindness-detection.
  53. Multi-class texture analysis in colorectal cancer histology. Scientific reports 6, 27988.
  54. Flexr: Few-shot classification with language embeddings for structured reporting of chest x-rays. arXiv preprint arXiv:2203.15723 .
  55. Bert: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of NAACL-HLT, pp. 4171–4186.
  56. Identifying medical diagnoses and treatable diseases by image-based deep learning. cell 172, 1122–1131.
  57. Unifying domain adaptation and domain generalization for robust prediction across minority racial groups, in: Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part I 21, Springer. pp. 521–537.
  58. Supervised contrastive learning. Advances in neural information processing systems 33, 18661–18673.
  59. Fostering transparent medical image ai via an image-text foundation model grounded in medical literature. medRxiv .
  60. Concept bottleneck with visual concept filtering for explainable medical image classification. arXiv preprint arXiv:2308.11920 .
  61. Internet-augmented dialogue generation, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8460–8478.
  62. Towards reliable zero shot classification in self-supervised models with conformal prediction. arXiv preprint arXiv:2210.15805 .
  63. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 1234–1240.
  64. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5495–5505.
  65. Tcsa: A text-guided cross-view medical semantic alignment framework for adaptive multi-view visual representation learning, in: International Symposium on Bioinformatics Research and Applications, Springer. pp. 136–149.
  66. Unibrain: Universal brain mri diagnosis with hierarchical knowledge-enhanced pre-training. arXiv preprint arXiv:2309.06828 .
  67. Clip-lung: Textual knowledge-guided lung nodule malignancy prediction, in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, Springer Nature Switzerland, Cham. pp. 403–412.
  68. Language-driven semantic segmentation, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=RriDjddCLN.
  69. Grounded language-image pre-training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965–10975.
  70. Ffa-ir: Towards an explainable and reliable medical report generation benchmark, in: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  71. Sda-clip: surgical visual domain adaptation using video and text labels. Quantitative Imaging in Medicine and Surgery 13, 6989.
  72. Online easy example mining for weakly-supervised gland segmentation from histology images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 578–587.
  73. High-order correlation preserved incomplete multi-view subspace clustering. IEEE Transactions on Image Processing 31, 2067–2080.
  74. Multimodal representation learning via maximization of local mutual information, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, Springer. pp. 273–283.
  75. Towards medical artificial general intelligence via knowledge-enhanced multimodal pretraining. arXiv preprint arXiv:2304.14204 .
  76. Gridclip: One-stage object detection by grid-level clip representation learning. arXiv preprint arXiv:2303.09252 .
  77. Pmc-clip: Contrastive language-image pre-training using biomedical documents. arXiv preprint arXiv:2303.07240 .
  78. Improving medical vision-language contrastive pretraining with semantics-aware triage. IEEE Transactions on Medical Imaging .
  79. M-flag: Medical vision-language pre-training with frozen language models and latent space geometry optimization, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 637–647.
  80. Imitate: Clinical prior guided hierarchical vision-language pre-training. arXiv preprint arXiv:2310.07355 .
  81. Etp: Learning transferable ecg representations via ecg-text pre-training. arXiv preprint arXiv:2309.07145 .
  82. An open access database for evaluating the algorithms of electrocardiogram rhythm and morphology abnormality detection. Journal of Medical Imaging and Health Informatics 8, 1368–1373.
  83. Clinically accurate chest x-ray report generation, in: Machine Learning for Healthcare Conference, PMLR. pp. 249–269.
  84. Parameter-efficient transfer learning for medical visual question answering. IEEE Transactions on Emerging Topics in Computational Intelligence .
  85. A chatgpt aided explainable framework for zero-shot medical image diagnosis. arXiv preprint arXiv:2307.01981 .
  86. Qilin-med-vl: Towards chinese large vision-language model for general healthcare. arXiv preprint arXiv:2310.17956 .
  87. Clip-driven universal model for organ segmentation and tumor detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21152–21164.
  88. Learning incrementally to segment multiple organs in a ct image, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 714–724.
  89. Ms-net: multi-site network for improving prostate segmentation with heterogeneous mri data. IEEE transactions on medical imaging 39, 2713–2724.
  90. Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering 35, 857–876.
  91. Learning hierarchical-order functional connectivity networks for mild cognitive impairment diagnosis, in: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), IEEE. pp. 1–5.
  92. Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022.
  93. A convnet for the 2020s, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986.
  94. Image segmentation using text and image prompts, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7086–7096.
  95. Segclip: Patch aggregation with learnable centers for open-vocabulary semantic segmentation, in: International Conference on Machine Learning, PMLR. pp. 23033–23044.
  96. Rethinking annotation granularity for overcoming shortcuts in deep learning–based radiograph diagnosis: A multicenter study. Radiology: Artificial Intelligence 4, e210299.
  97. Medclip: Fine-tuning a clip model on the roco medical dataset. URL: https://github.com/Kaushalya/medclip.
  98. Y-net: joint segmentation and classification for diagnosis of breast biopsy images, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11, Springer. pp. 893–901.
  99. Improving zero-shot detection of low prevalence chest pathologies using domain pre-trained language models. arXiv preprint arXiv:2306.08000 .
  100. The role of local alignment and uniformity in image-text contrastive learning on medical images. arXiv preprint arXiv:2211.07254 .
  101. Joint learning of localized representations from medical images and reports, in: European Conference on Computer Vision, Springer. pp. 685–701.
  102. Radiological reports improve pre-training for localized imaging tasks on chest x-rays, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 647–657.
  103. Anatomy-driven pathology detection on chest x-rays, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 57–66.
  104. Evolving multi-label classification rules by exploiting high-order label correlations. Neurocomputing 417, 176–186.
  105. Scispacy: fast and robust models for biomedical natural language processing. arXiv preprint arXiv:1902.07669 .
  106. Vindr-cxr: An open dataset of chest x-rays with radiologist’s annotations. Scientific Data 9, 429.
  107. Brain tumor mri dataset. URL: https://www.kaggle.com/dsv/2645886, doi:10.34740/KAGGLE/DSV/2645886.
  108. Sparse coding and high-order correlations in fine-scale cortical networks. Nature 466, 617–621.
  109. Attaining human-level performance with atlas location autocontext for anatomical landmark detection in 3d ct data, in: Proceedings of the European conference on computer vision (ECCV) Workshops, pp. 0–0.
  110. Learning hierarchical attention for weakly-supervised chest x-ray abnormality localization and diagnosis. IEEE transactions on medical imaging .
  111. High-level cognition during story listening is reflected in high-order dynamic correlations in neural activity patterns. Nature Communications 12, 5728.
  112. Learn the new, keep the old: Extending pretrained models with new anatomy and images, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part IV 11, Springer. pp. 361–369.
  113. Extending pretrained segmentation networks with additional anatomical structures. International journal of computer assisted radiology and surgery 14, 1187–1195.
  114. Tier: Text-image entropy regularization for medical clip-style models. Proceedings of Machine Learning Research LEAVE UNSET 1, 21.
  115. Enhancing automatic placenta analysis through distributional feature recomposition in vision-language contrastive learning, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 116–126.
  116. Vision-language contrastive learning approach to robust automatic placenta analysis using photographic images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 707–716.
  117. A survey on biomedical image captioning, in: Proceedings of the second workshop on shortcomings in vision and language, pp. 26–36.
  118. Coarse to fine vertebrae localization and segmentation with spatialconfiguration-net and u-net., in: VISIGRAPP (5: VISAPP), pp. 124–133.
  119. Radiology objects in context (roco): a multimodal image dataset, in: Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis: 7th Joint International Workshop, CVII-STENT 2018 and Third International Workshop, LABELS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Proceedings 3, Springer. pp. 180–189.
  120. Xplainer: From x-ray observations to explainable zero-shot diagnosis. arXiv preprint arXiv:2303.13391 .
  121. Negbio: a high-performance tool for negation and uncertainty detection in radiology reports. AMIA Summits on Translational Science Proceedings 2018, 188.
  122. Decoding radiologists intense focus for accurate cxr diagnoses: A controllable and interpretable ai system. arXiv preprint arXiv:2309.13550 .
  123. Exploring transfer learning in medical image segmentation using vision-language models. arXiv preprint arXiv:2308.07706 .
  124. Learning transferable visual models from natural language supervision, in: International Conference on Machine Learning, PMLR. pp. 8748–8763.
  125. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 3.
  126. Smallcap: lightweight image captioning prompted with retrieval augmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2840–2849.
  127. Denseclip: Language-guided dense prediction with context-aware prompting, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18082–18091.
  128. Age-and sex-related differences in community-acquired pneumonia at presentation to the emergency department: a retrospective cohort study. European journal of emergency medicine 29, 366–372.
  129. Local contrastive learning for medical image recognition. arXiv preprint arXiv:2303.14153 .
  130. High-resolution image synthesis with latent diffusion models, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695.
  131. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems 35, 25278–25294.
  132. Breaking with fixed set pathology recognition through report-guided contrastive training, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 690–700.
  133. Multi-label classification with high-rank and high-order label correlations. IEEE Transactions on Knowledge and Data Engineering .
  134. A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 .
  135. Open-ended medical visual question answering through prefix tuning of language models. arXiv preprint arXiv:2303.05977 .
  136. X-tra: Improving chest x-ray tasks with cross-modal retrieval augmentation, in: International Conference on Information Processing in Medical Imaging, Springer. pp. 471–482.
  137. Community-acquired pneumonia in elderly patients. Aging health 5, 763–774.
  138. Medicat: A dataset of medical images, captions, and textual references. arXiv preprint arXiv:2010.06000 .
  139. Eva-clip: Improved training techniques for clip at scale. arXiv preprint arXiv:2303.15389 .
  140. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. Advances in Neural Information Processing Systems 35, 30569–30582.
  141. Interactive and explainable region-guided radiology report generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7433–7442.
  142. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems 30.
  143. Conditional convolutions for instance segmentation, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, Springer. pp. 282–298.
  144. Tintinalli’s Emergency Medicine: A Comprehensive Study Guide, 8e. McGraw Hill Education.
  145. Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning. Nature Biomedical Engineering 6, 1399–1406.
  146. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 .
  147. The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data 5, 1–9.
  148. Inference of captions from histopathological patches, in: International Conference on Medical Imaging with Deep Learning, PMLR. pp. 1235–1250.
  149. Graph attention networks, in: International Conference on Learning Representations.
  150. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG) 41, 1–11.
  151. Ptb-xl, a large publicly available electrocardiography dataset. Scientific data 7, 154.
  152. The radiology report—are we getting the message across? Clinical radiology 66, 1015–1022.
  153. Multi-granularity cross-modal alignment for generalized medical visual representation learning. Advances in Neural Information Processing Systems 35, 33536–33549.
  154. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2097–2106.
  155. Dense contrastive learning for self-supervised visual pre-training, in: Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR).
  156. Unified medical image-text-label contrastive learning with continuous prompt. arXiv preprint arXiv:2307.05920 .
  157. Improving zero-shot generalization for clip with synthesized prompts, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3032–3042.
  158. Foundation model for endoscopy video analysis via large-scale self-supervised pre-train, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer.
  159. Cris: Clip-driven referring image segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11686–11695.
  160. A medical semantic-assisted transformer for radiographic report generation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 655–664.
  161. Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163 .
  162. Focused decoding enables 3d anatomical detection by transformers. arXiv preprint arXiv:2207.10774 .
  163. Medklip: Medical knowledge enhanced language-image pre-training. medRxiv , 2023–01.
  164. Multi-view vertebra localization and identification from ct images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 136–145.
  165. Zero-shot nuclei detection via visual-language pre-trained models, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 693–703.
  166. Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097 .
  167. Robust and interpretable medical image classifiers via concept bottleneck models. arXiv preprint arXiv:2310.03182 .
  168. Tceip: Text condition embedded regression network for dental implant position prediction. arXiv preprint arXiv:2306.14406 .
  169. Automated breast ultrasound lesions detection using convolutional neural networks. IEEE journal of biomedical and health informatics 22, 1218–1226.
  170. Cxr-clip: Toward large scale chest x-ray language-image pre-training, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 101–111.
  171. Evaluating progress in automatic chest x-ray radiology report generation. Patterns 4.
  172. Towards counterfactual image manipulation via clip, in: Proceedings of the 30th ACM International Conference on Multimedia, pp. 3637–3645.
  173. Dental enumeration and multiple treatment detection on panoramic x-rays using deep learning. Scientific reports 11, 12342.
  174. Scaling vision transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12104–12113.
  175. Multilabel image classification via high-order label correlation driven active learning. IEEE Transactions on Image Processing 23, 1430–1441.
  176. Huatuogpt, towards taming language model to be a doctor. arXiv preprint arXiv:2305.15075 .
  177. Dodnet: Learning to segment multi-organ and tumors from multiple partially labeled datasets, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1195–1204.
  178. Multi-task paired masking with alignment modeling for medical vision-language pre-training. IEEE Transactions on Multimedia .
  179. On the challenges and perspectives of foundation models for medical image analysis. arXiv preprint arXiv:2306.05705 .
  180. Large-scale domain-specific pretraining for biomedical vision-language processing. arXiv preprint arXiv:2303.00915 .
  181. Transws: Transformer-based weakly supervised histology image segmentation, in: International Workshop on Machine Learning in Medical Imaging, Springer. pp. 367–376.
  182. Tpro: Text-prompting-based weakly supervised histopathology tissue segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 109–118.
  183. Knowledge-enhanced visual-language pre-training on chest radiology images. Nature Communications 14, 4542.
  184. Pmc-vqa: Visual instruction tuning for medical visual question answering. arXiv preprint arXiv:2305.10415 .
  185. Text-guided foundation model adaptation for pathological image classification, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 272–282.
  186. Contrastive learning of medical visual representations from paired images and text, in: Machine Learning for Healthcare Conference, PMLR. pp. 2–25.
  187. Continual learning for abdominal multi-organ and tumor segmentation, in: International conference on medical image computing and computer-assisted intervention, Springer. pp. 35–45.
  188. Hybrid high-order functional connectivity networks using resting-state functional mri for mild cognitive impairment diagnosis. Scientific reports 7, 6530.
  189. Rethinking graph convolutional networks in knowledge graph completion, in: Proceedings of the ACM Web Conference 2022, pp. 798–807.
  190. Diagnose like a radiologist: Hybrid neuro-probabilistic reasoning for attribute-based medical image diagnosis. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7400–7416.
  191. Chatcad+: Towards a universal and reliable interactive cad using llms. arXiv preprint arXiv:2305.15964 .
  192. Exploring low-resource medical image classification with weakly supervised prompt learning. Available at SSRN 4578827 .
  193. Learning deep features for discriminative localization, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929.
  194. Cross-modal translation and alignment for survival analysis, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21485–21494.
  195. Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports. Nature Machine Intelligence 4, 32–40.
  196. Advancing radiograph representation learning with masked record modeling, in: The Eleventh International Conference on Learning Representations.
  197. Conditional prompt learning for vision-language models, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16816–16825.
  198. Learning to prompt for vision-language models. International Journal of Computer Vision 130, 2337–2348.
  199. Anomalyclip: Object-agnostic prompt learning for zero-shot anomaly detection. arXiv preprint arXiv:2310.18961 .
  200. Collaborative learning of semi-supervised segmentation and classification for medical images, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2079–2088.
  201. Multimodal c4: An open, billion-scale corpus of images interleaved with text. arXiv preprint arXiv:2304.06939 .
  202. Vision transformers for dense prediction: A survey. Knowledge-Based Systems 253, 109552.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Zihao Zhao (42 papers)
  2. Yuxiao Liu (16 papers)
  3. Han Wu (124 papers)
  4. Yonghao Li (7 papers)
  5. Sheng Wang (239 papers)
  6. Lin Teng (3 papers)
  7. Disheng Liu (5 papers)
  8. Zhiming Cui (34 papers)
  9. Qian Wang (453 papers)
  10. Dinggang Shen (153 papers)
  11. Mei Wang (41 papers)
Citations (35)
Github Logo Streamline Icon: https://streamlinehq.com