Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training (2401.01179v1)

Published 2 Jan 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Modern healthcare often utilises radiographic images alongside textual reports for diagnostics, encouraging the use of Vision-Language Self-Supervised Learning (VL-SSL) with large pre-trained models to learn versatile medical vision representations. However, most existing VL-SSL frameworks are trained end-to-end, which is computation-heavy and can lose vital prior information embedded in pre-trained encoders. To address both issues, we introduce the backbone-agnostic Adaptor framework, which preserves medical knowledge in pre-trained image and text encoders by keeping them frozen, and employs a lightweight Adaptor module for cross-modal learning. Experiments on medical image classification and segmentation tasks across three datasets reveal that our framework delivers competitive performance while cutting trainable parameters by over 90% compared to current pre-training approaches. Notably, when fine-tuned with just 1% of data, Adaptor outperforms several Transformer-based methods trained on full datasets in medical image segmentation.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (29)

Authors (5)

Jiuming Qin (1 paper)
Che Liu (59 papers)
Sibo Cheng (36 papers)
Yike Guo (144 papers)
Rossella Arcucci (50 papers)

Citations (2)

View on Semantic Scholar

Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training (2401.01179v1)

Related Papers