Papers
Topics
Authors
Recent
2000 character limit reached

Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging (2506.18434v1)

Published 23 Jun 2025 in cs.CV and cs.AI

Abstract: AI holds significant promise for improving prognosis prediction in medical imaging, yet its effective application remains challenging. In this work, we introduce a structured benchmark explicitly designed to evaluate and compare the transferability of Convolutional Neural Networks and Foundation Models in predicting clinical outcomes in COVID-19 patients, leveraging diverse publicly available Chest X-ray datasets. Our experimental methodology extensively explores a wide set of fine-tuning strategies, encompassing traditional approaches such as Full Fine-Tuning and Linear Probing, as well as advanced Parameter-Efficient Fine-Tuning methods including Low-Rank Adaptation, BitFit, VeRA, and IA3. The evaluations were conducted across multiple learning paradigms, including both extensive full-data scenarios and more clinically realistic Few-Shot Learning settings, which are critical for modeling rare disease outcomes and rapidly emerging health threats. By implementing a large-scale comparative analysis involving a diverse selection of pretrained models, including general-purpose architectures pretrained on large-scale datasets such as CLIP and DINOv2, to biomedical-specific models like MedCLIP, BioMedCLIP, and PubMedCLIP, we rigorously assess each model's capacity to effectively adapt and generalize to prognosis tasks, particularly under conditions of severe data scarcity and pronounced class imbalance. The benchmark was designed to capture critical conditions common in prognosis tasks, including variations in dataset size and class distribution, providing detailed insights into the strengths and limitations of each fine-tuning strategy. This extensive and structured evaluation aims to inform the practical deployment and adoption of robust, efficient, and generalizable AI-driven solutions in real-world clinical prognosis prediction workflows.

Summary

  • The paper benchmarks foundation models and PEFT techniques, revealing tradeoffs between full-data and few-shot scenarios.
  • The paper demonstrates that CNNs outperform FMs on small datasets while FMs with PEFT excel in larger, balanced settings.
  • The paper highlights the potential of few-shot learning with selective PEFT methods to enhance prognosis prediction in clinical imaging.

Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging

Introduction

The paper "Benchmarking Foundation Models and Parameter-Efficient Fine-Tuning for Prognosis Prediction in Medical Imaging" (2506.18434) explores the utilization of AI, specifically focusing on foundation models (FMs) and parameter-efficient fine-tuning (PEFT) techniques for prognosis prediction in medical imaging, a domain characterized by significant promise yet complex challenges. This paper focuses on COVID-19 patient data using chest X-ray datasets, assessing various models and fine-tuning strategies under full-data and few-shot learning scenarios.

Experimental Setup

The paper employs several well-curated datasets, including AIforCOVID, CoCross, COVID-19-AR, and Stony Brook COVID-19 datasets, each annotated with distinct prognostic outcomes. The benchmark includes a systematic comparison of CNNs and FMs, with pretraining strategies spanning supervised learning, self-supervised learning (SSL), and contrastive language-image pretraining.

  • Datasets Used: Diverse publicly available datasets with differing prognostic labels (e.g., mortality, severity).
  • Models Evaluated: Combination of traditional CNNs (e.g., ResNet, DenseNet) and advanced FMs (e.g., CLIP, DINOv2).
  • Fine-Tuning Techniques: Full Fine-Tuning (FFT), Linear Probing (LP), and PEFT strategies like Low-Rank Adaptation (LoRA), BitFit, VeRA, and IA3^{3}.
  • Validation Approaches: Leave-One-Center-Out (LOCO) and 5-fold cross-validation, tailored to dataset specifics. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: This figure includes only the fine-tuning techniques applicable to both CNN and FM architecture families.

Results

CNNs vs. FMs under Different Fine-Tuning Strategies

The paper indicates CNNs, particularly when fully fine-tuned, outperform FMs on extremely small datasets due to efficient training dynamics and strong inductive biases. However, FMs with PEFT show enhanced scalability and adaptability with larger datasets. Figure 2

Figure 2: This figure displays the mean over all datasets performances respecting only the fine-tuning techniques applicable to both CNN and FM architecture families.

PEFT Strategies: Efficiency vs. Effectiveness

PEFT methods demonstrate significant sensitivity to dataset characteristics; while successful in large, balanced settings, they require careful configuration in highly imbalanced or small datasets. Techniques like LoRA and BitFit suggest substantial gains on DINOv2 variants under efficient adaptation scenarios. Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: This figure presents one subplot for each dataset, illustrating on the y-axis the MCC results and on the x-axis the percentage of models' parameters trained over the total.

Few-Shot Learning Insights

Few-Shot Learning remains challenging due to severe data constraints, yet LP and selective PEFT methods show potential by successfully leveraging pretrained representations. This aspect is crucial in real-world scenarios like rare diseases or emerging health crises. Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4

Figure 4: This figure illustrates the average performance of PEFT techniques across different datasets, displaying overall adaptation efficiency.

Conclusion

The benchmark outlines a methodological foundation for deploying FMs in prognosis tasks, offering insights into effective adaptation strategies in clinical workflows. The paper emphasizes no universally optimal strategy exists; the choice is contingent upon data availability, model architecture, and specific task constraints.

The research invites further exploration of PEFT techniques in Few-Shot scenarios, reiterating their potential for robust generalization in real-world medical applications despite severe training constraints. The successful leverage of FMs, combined with efficient fine-tuning, can significantly inform the adoption of robust AI systems in clinical prognosis prediction.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com