Diversified in-domain synthesis with efficient fine-tuning for few-shot classification (2312.03046v2)
Abstract: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. A recent research direction for improving few-shot classifiers involves augmenting the labelled samples with synthetic images created by state-of-the-art text-to-image generation models. Following this trend, we propose Diversified In-domain Synthesis with Efficient Fine-tuning (DISEF), a novel approach which addresses the generalization challenge in few-shot learning using synthetic data. DISEF consists of two main components. First, we propose a novel text-to-image augmentation pipeline that, by leveraging the real samples and their rich semantics coming from an advanced captioning model, promotes in-domain sample diversity for better generalization. Second, we emphasize the importance of effective model fine-tuning in few-shot recognition, proposing to use Low-Rank Adaptation (LoRA) for joint adaptation of the text and image encoders in a Vision LLM. We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification. Code is available at https://github.com/vturrisi/disef.
- Synthetic data from diffusion models improves imagenet classification. Transactions on Machine Learning Research (TMLR), 2023.
- Food-101–mining discriminative components with random forests. In Proceedings of the IEEE/CVF European Conference on Computer vision (ECCV), 2014.
- One-for-all: Generalized lora for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967, 2023.
- Describing textures in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- Randaugment: Practical automated data augmentation with a reduced search space. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2006.
- Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision (IJCV), 2023a.
- Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010, 2023b.
- Is synthetic data from generative models ready for image recognition? In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
- Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning (ICML), 2019.
- Lora: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- Visual prompt tuning. arXiv preprint arXiv:2203.12119, 2022.
- Few-shot metric learning: Online adaptation of embedding for retrieval. In Proceedings of the Asian Conference on Computer Vision (ACCV), 2022.
- Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) workshops, 2013.
- Deep metric learning for few-shot image classification: A review of recent developments. Pattern Recognition, 2023.
- Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the Association for Computational Linguistics (ACL), 2021.
- Explore the power of synthetic data on few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023a.
- Visual instruction tuning. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
- Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2023.
- Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
- When does label smoothing help? In Advances in Neural Information Processing Systems (NeurIPS), 2019.
- GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the International Conference on Machine Learning (ICML), 2022.
- Automated flower classification over a large number of classes. In Proceedings of the IEEE Indian Conference on Computer Vision, Graphics & Image Processing, 2008.
- Cats and dogs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
- Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML), 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (JMLR), 2020.
- Learning multiple visual domains with residual adapters. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 2015.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Fake it till you make it: Learning transferable representations from synthetic imagenet clones. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Logoprompt: Synthetic text images can be good visual prompts for vision-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) workshops, 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference of Machine Learning (ICML) workshops, 2015.
- Stablerep: Synthetic images from text-to-image models make strong visual representation learners. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- SUN database: Large-scale scene recognition from abbey to zoo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023a.
- Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023b.
- Learning to prompt for vision-language models. International Journal of Computer Vision (IJCV), 2022a.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022b.
- Training on thin air: Improve image classification with generated data. arXiv preprint arXiv:2305.15316, 2023.
- A comprehensive survey on transfer learning. Proceedings of the IEEE, 2019.
- Victor G. Turrisi da Costa (5 papers)
- Nicola Dall'Asen (10 papers)
- Yiming Wang (141 papers)
- Nicu Sebe (270 papers)
- Elisa Ricci (137 papers)