TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification (2312.07125v2)
Abstract: Few-shot learning has been studied to adapt models to tasks with very few samples. It holds profound significance, particularly in clinical tasks, due to the high annotation cost of medical images. Several works have explored few-shot learning on medical images, yet they still require a large number of medical images for pre-training models to gain domain-specific priors. Vision foundation models recently have achieved remarkable success in natural images. Hence, adapting rapidly advancing vision foundation models from natural images to few-shot clinical tasks holds great promise. MedFMC has recently organized a challenge to shed more light on this topic at NeurIPS 2023. In this work, we present our challenge solution. We observe that a simple variant of fine-tuning with partial freezing shows remarkable performance. Empirical evidence demonstrates that this approach could outperform various common fine-tuning methods under limited sample sizes. Additionally, we explore enhanced utilization of semantic supervision to boost performance. We propose a novel approach that contextualizes labels via LLMs. Our findings reveal that the context generated by LLMs significantly enhances the discrimination of semantic embeddings for similar categories, resulting in a notable performance improvement of 3%-5% in 1-shot settings compared to commonly employed one-hot labels and other semantic supervision methods. Our solution secures the 1st place in the MedFMC challenge.
- Visual-semantic contrastive alignment for few-shot image classification. CoRR, abs/2210.11000, 2022.
- A closer look at few-shot classification. In ICLR, 2019.
- Semantic prompt for few-shot image recognition. CoRR, abs/2303.14123, 2023.
- Pfemed: Few-shot medical image classification using prior guided feature enhancement. Pattern Recognit., 2023.
- A baseline for few-shot image classification. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- Davit: Dual attention vision transformers. In ECCV, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
- Masked autoencoders are scalable vision learners. In CVPR, 2022.
- Lora: Low-rank adaptation of large language models. In ICLR, 2022a.
- Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference. In CVPR, 2022b.
- Visual prompt tuning. In ECCV, 2022.
- FILM: how can few-shot image classification benefit from pre-trained language models? CoRR, abs/2307.04114, 2023.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Boosting few-shot learning with adaptive margin loss. In CVPR, 2020.
- Grounded language-image pre-training. In CVPR, pages 10955–10965. IEEE, 2022.
- PMC-CLIP: contrastive language-image pre-training using biomedical documents. In MICCAI, 2023.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- Visual language pretrained multiple instance zero-shot transfer for histopathology images. In CVPR, 2023.
- A closer look at few-shot classification again. In ICML, 2023.
- What makes transfer learning work for medical images: Feature reuse & other factors. In CVPR, 2022.
- Discriminative ensemble learning for few-shot chest x-ray diagnosis. Medical Image Anal., 2021.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Adapterdrop: On the efficiency of adapters in transformers. In EMNLP, 2021.
- FHIST: A benchmark for few-shot classification of histological images. CoRR, abs/2206.00092, 2022.
- Partial is better than all: Revisiting fine-tuning strategy for few-shot learning. In AAAI, 2021.
- Metamed: Few-shot medical image classification using gradient-based meta-learning. Pattern Recognit., 2021.
- Prototypical networks for few-shot learning. In NeurIPSUSA.
- Few-shot medical image segmentation using a global correlation network with discriminative embedding. Comput. Biol. Medicine, 140:105067, 2022.
- Learning to compare: Relation network for few-shot learning. In CVPR, 2018.
- Medfmc: A real-world dataset and benchmark for foundation model adaptation in medical image classification. CoRR, abs/2306.09579, 2023.
- Medclip: Contrastive learning from unpaired medical images and text. In EMNLP. Association for Computational Linguistics, 2022.
- Medical SAM adapter: Adapting segment anything model for medical image segmentation. CoRR, 2023.
- Adaptive cross-modal few-shot learning. In NeurIPS, 2019.
- Attribute prototype network for any-shot learning. Int. J. Comput. Vis., 130(7):1735–1753, 2022.
- Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In CVPR, 2020.
- Knowledge-enhanced visual-language pre-training on chest radiology images, 2023a.
- Text-guided foundation model adaptation for pathological image classification. In MICCAI, 2023b.
- Recognize anything: A strong image tagging model. CoRR, abs/2306.03514, 2023c.
- Learning to prompt for vision-language models. Int. J. Comput. Vis., 130(9):2337–2348, 2022.
- Kaipeng Zheng (7 papers)
- Weiran Huang (54 papers)
- Lichao Sun (186 papers)