Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning (2401.01591v1)

Published 3 Jan 2024 in cs.CV

Abstract: Existing contrastive language-image pre-training aims to learn a joint representation by matching abundant image-text pairs. However, the number of image-text pairs in medical datasets is usually orders of magnitude smaller than that in natural datasets. Besides, medical image-text pairs often involve numerous complex fine-grained correspondences. This paper aims to enhance the data efficiency by introducing multiple-to-multiple local relationship modeling to capture denser supervisions. More specifically, we propose a Medical Language-Image Pre-training (MLIP) framework, which exploits the limited image-text medical data more efficiently through patch-sentence matching. Furthermore, we introduce a masked contrastive learning strategy with semantic integrity estimation to reduce redundancy in images while preserving the underlying semantics. Our evaluation results show that MLIP outperforms previous work in zero/few-shot classification and few-shot segmentation tasks by a large margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Tiu Ekin et al., “Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning,” Nature Biomedical Engineering, vol. 6, no. 12, pp. 1399–1406, December 2022.
  2. “MedCLIP: Contrastive learning from unpaired medical images and text,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 3876–3887.
  3. Alec Radford et al., “Learning transferable visual models from natural language supervision,” in Proceedings of the 38th International Conference on Machine Learning, July 2021, vol. 139, pp. 8748–8763.
  4. Alistair E. W. Johnson et al., “The MIMIC-CXR database,” 2019, https://physionet.org/content/mimic-cxr/.
  5. Shih-Cheng Huang et al., “GLoRIA: A multimodal global-local representation learning framework for label-efficient medical image recognition,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021, pp. 3922–3931.
  6. “Fine-grained image-text matching by cross-modal hard aligning network,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF, 2023, pp. 19275–19284.
  7. Fuying Wang et al., “Multi-granularity cross-modal alignment for generalized medical visual representation learning,” in Advances in Neural Information Processing Systems, 2022.
  8. Bo Liu et al., “Improving medical vision-language contrastive pretraining with semantics-aware triage,” IEEE Transactions on Medical Imaging, pp. 1–1, July 2023.
  9. Chaoyi Wu et al., “MedKLIP: Medical knowledge enhanced language-image pre-training for x-ray diagnosis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 21372–21383.
  10. Yanghao Li et al., “Scaling language-image pre-training via masking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23390–23400.
  11. Philip A. Bernstein et al., “FAST MAINTENANCE OF SEMANTIC INTEGRITY ASSERTIONS USING REDUNDANT AGGREGATE DATA,” in Readings in Artificial Intelligence and Databases, pp. 457–467. 1989.
  12. Yuhao Zhang et al., “Contrastive learning of medical visual representations from paired images and text,” in Proceedings of the 7th Machine Learning for Healthcare Conference, 2022, pp. 2–25.
  13. Boecking et al., “Making the most of text semantics to improve biomedical vision–language processing,” in Computer Vision – ECCV 2022, 2022, pp. 1–21.
  14. George Shih et al., “Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia,” Radiology: Artificial Intelligence, vol. 1, no. 1, pp. e180041, January 2019.
  15. Anna Zawacki et al., “Siim-acr pneumothorax segmentation,” 2019, https://kaggle.com/competitions/siim-acr-pneumothorax-segmentation.
  16. “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, July 2017.
  17. Hong-Yu Zhou et al., “Advancing radiograph representation learning with masked record modeling,” in The Eleventh International Conference on Learning Representations, 2023.
  18. Hong-Yu Zhou et al., “Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports,” Nature Machine Intelligence, vol. 4, no. 1, pp. 32–40, January 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiarun Liu (17 papers)
  2. Hong-Yu Zhou (50 papers)
  3. Cheng Li (1094 papers)
  4. Weijian Huang (19 papers)
  5. Hao Yang (328 papers)
  6. Yong Liang (32 papers)
  7. Shanshan Wang (167 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.