MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning (2401.01591v1)

Published 3 Jan 2024 in cs.CV

Abstract: Existing contrastive language-image pre-training aims to learn a joint representation by matching abundant image-text pairs. However, the number of image-text pairs in medical datasets is usually orders of magnitude smaller than that in natural datasets. Besides, medical image-text pairs often involve numerous complex fine-grained correspondences. This paper aims to enhance the data efficiency by introducing multiple-to-multiple local relationship modeling to capture denser supervisions. More specifically, we propose a Medical Language-Image Pre-training (MLIP) framework, which exploits the limited image-text medical data more efficiently through patch-sentence matching. Furthermore, we introduce a masked contrastive learning strategy with semantic integrity estimation to reduce redundancy in images while preserving the underlying semantics. Our evaluation results show that MLIP outperforms previous work in zero/few-shot classification and few-shot segmentation tasks by a large margin.

References (18)

Authors (7)

Jiarun Liu (17 papers)
Hong-Yu Zhou (50 papers)
Cheng Li (1094 papers)
Weijian Huang (19 papers)
Hao Yang (328 papers)
Yong Liang (32 papers)
Shanshan Wang (167 papers)

Citations (4)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning (2401.01591v1)

Summary

Related Papers