MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning (2401.01591v1)
Abstract: Existing contrastive language-image pre-training aims to learn a joint representation by matching abundant image-text pairs. However, the number of image-text pairs in medical datasets is usually orders of magnitude smaller than that in natural datasets. Besides, medical image-text pairs often involve numerous complex fine-grained correspondences. This paper aims to enhance the data efficiency by introducing multiple-to-multiple local relationship modeling to capture denser supervisions. More specifically, we propose a Medical Language-Image Pre-training (MLIP) framework, which exploits the limited image-text medical data more efficiently through patch-sentence matching. Furthermore, we introduce a masked contrastive learning strategy with semantic integrity estimation to reduce redundancy in images while preserving the underlying semantics. Our evaluation results show that MLIP outperforms previous work in zero/few-shot classification and few-shot segmentation tasks by a large margin.
- Tiu Ekin et al., “Expert-level detection of pathologies from unannotated chest x-ray images via self-supervised learning,” Nature Biomedical Engineering, vol. 6, no. 12, pp. 1399–1406, December 2022.
- “MedCLIP: Contrastive learning from unpaired medical images and text,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 3876–3887.
- Alec Radford et al., “Learning transferable visual models from natural language supervision,” in Proceedings of the 38th International Conference on Machine Learning, July 2021, vol. 139, pp. 8748–8763.
- Alistair E. W. Johnson et al., “The MIMIC-CXR database,” 2019, https://physionet.org/content/mimic-cxr/.
- Shih-Cheng Huang et al., “GLoRIA: A multimodal global-local representation learning framework for label-efficient medical image recognition,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021, pp. 3922–3931.
- “Fine-grained image-text matching by cross-modal hard aligning network,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE/CVF, 2023, pp. 19275–19284.
- Fuying Wang et al., “Multi-granularity cross-modal alignment for generalized medical visual representation learning,” in Advances in Neural Information Processing Systems, 2022.
- Bo Liu et al., “Improving medical vision-language contrastive pretraining with semantics-aware triage,” IEEE Transactions on Medical Imaging, pp. 1–1, July 2023.
- Chaoyi Wu et al., “MedKLIP: Medical knowledge enhanced language-image pre-training for x-ray diagnosis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 21372–21383.
- Yanghao Li et al., “Scaling language-image pre-training via masking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23390–23400.
- Philip A. Bernstein et al., “FAST MAINTENANCE OF SEMANTIC INTEGRITY ASSERTIONS USING REDUNDANT AGGREGATE DATA,” in Readings in Artificial Intelligence and Databases, pp. 457–467. 1989.
- Yuhao Zhang et al., “Contrastive learning of medical visual representations from paired images and text,” in Proceedings of the 7th Machine Learning for Healthcare Conference, 2022, pp. 2–25.
- Boecking et al., “Making the most of text semantics to improve biomedical vision–language processing,” in Computer Vision – ECCV 2022, 2022, pp. 1–21.
- George Shih et al., “Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia,” Radiology: Artificial Intelligence, vol. 1, no. 1, pp. e180041, January 2019.
- Anna Zawacki et al., “Siim-acr pneumothorax segmentation,” 2019, https://kaggle.com/competitions/siim-acr-pneumothorax-segmentation.
- “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, July 2017.
- Hong-Yu Zhou et al., “Advancing radiograph representation learning with masked record modeling,” in The Eleventh International Conference on Learning Representations, 2023.
- Hong-Yu Zhou et al., “Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports,” Nature Machine Intelligence, vol. 4, no. 1, pp. 32–40, January 2022.
- Jiarun Liu (17 papers)
- Hong-Yu Zhou (50 papers)
- Cheng Li (1094 papers)
- Weijian Huang (19 papers)
- Hao Yang (328 papers)
- Yong Liang (32 papers)
- Shanshan Wang (167 papers)