Diversified in-domain synthesis with efficient fine-tuning for few-shot classification (2312.03046v2)

Published 5 Dec 2023 in cs.CV

Abstract: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. A recent research direction for improving few-shot classifiers involves augmenting the labelled samples with synthetic images created by state-of-the-art text-to-image generation models. Following this trend, we propose Diversified In-domain Synthesis with Efficient Fine-tuning (DISEF), a novel approach which addresses the generalization challenge in few-shot learning using synthetic data. DISEF consists of two main components. First, we propose a novel text-to-image augmentation pipeline that, by leveraging the real samples and their rich semantics coming from an advanced captioning model, promotes in-domain sample diversity for better generalization. Second, we emphasize the importance of effective model fine-tuning in few-shot recognition, proposing to use Low-Rank Adaptation (LoRA) for joint adaptation of the text and image encoders in a Vision LLM. We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification. Code is available at https://github.com/vturrisi/disef.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (51)

Authors (5)

Victor G. Turrisi da Costa (5 papers)
Nicola Dall'Asen (10 papers)
Yiming Wang (141 papers)
Nicu Sebe (270 papers)
Elisa Ricci (137 papers)

Citations (1)

View on Semantic Scholar

GitHub

GitHub - vturrisi/disef: Pytorch implementation of "Diversified in-domain synthesis with efficient fine-tuning for few-shot classification" (13 stars)

Diversified in-domain synthesis with efficient fine-tuning for few-shot classification (2312.03046v2)

Related Papers

GitHub