Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diversified in-domain synthesis with efficient fine-tuning for few-shot classification (2312.03046v2)

Published 5 Dec 2023 in cs.CV

Abstract: Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class. A recent research direction for improving few-shot classifiers involves augmenting the labelled samples with synthetic images created by state-of-the-art text-to-image generation models. Following this trend, we propose Diversified In-domain Synthesis with Efficient Fine-tuning (DISEF), a novel approach which addresses the generalization challenge in few-shot learning using synthetic data. DISEF consists of two main components. First, we propose a novel text-to-image augmentation pipeline that, by leveraging the real samples and their rich semantics coming from an advanced captioning model, promotes in-domain sample diversity for better generalization. Second, we emphasize the importance of effective model fine-tuning in few-shot recognition, proposing to use Low-Rank Adaptation (LoRA) for joint adaptation of the text and image encoders in a Vision LLM. We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification. Code is available at https://github.com/vturrisi/disef.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Synthetic data from diffusion models improves imagenet classification. Transactions on Machine Learning Research (TMLR), 2023.
  2. Food-101–mining discriminative components with random forests. In Proceedings of the IEEE/CVF European Conference on Computer vision (ECCV), 2014.
  3. One-for-all: Generalized lora for parameter-efficient fine-tuning. arXiv preprint arXiv:2306.07967, 2023.
  4. Describing textures in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  5. Randaugment: Practical automated data augmentation with a reduced search space. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  6. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2006.
  7. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision (IJCV), 2023a.
  8. Llama-adapter v2: Parameter-efficient visual instruction model. arXiv preprint arXiv:2304.15010, 2023b.
  9. Is synthetic data from generative models ready for image recognition? In Proceedings of the International Conference on Learning Representations (ICLR), 2023.
  10. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2019.
  11. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  12. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  13. Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
  14. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning (ICML), 2019.
  15. Lora: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
  16. Visual prompt tuning. arXiv preprint arXiv:2203.12119, 2022.
  17. Few-shot metric learning: Online adaptation of embedding for retrieval. In Proceedings of the Asian Conference on Computer Vision (ACCV), 2022.
  18. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  19. 3d object representations for fine-grained categorization. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) workshops, 2013.
  20. Deep metric learning for few-shot image classification: A review of recent developments. Pattern Recognition, 2023.
  21. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the Association for Computational Linguistics (ACL), 2021.
  22. Explore the power of synthetic data on few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  23. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  24. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023a.
  25. Visual instruction tuning. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
  26. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
  27. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2023.
  28. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  29. When does label smoothing help? In Advances in Neural Information Processing Systems (NeurIPS), 2019.
  30. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the International Conference on Machine Learning (ICML), 2022.
  31. Automated flower classification over a large number of classes. In Proceedings of the IEEE Indian Conference on Computer Vision, Graphics & Image Processing, 2008.
  32. Cats and dogs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
  33. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML), 2021.
  34. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (JMLR), 2020.
  35. Learning multiple visual domains with residual adapters. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  36. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  37. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 2015.
  38. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  39. Fake it till you make it: Learning transferable representations from synthetic imagenet clones. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  40. Logoprompt: Synthetic text images can be good visual prompts for vision-language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  41. Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) workshops, 2023.
  42. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference of Machine Learning (ICML) workshops, 2015.
  43. Stablerep: Synthetic images from text-to-image models make strong visual representation learners. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
  44. SUN database: Large-scale scene recognition from abbey to zoo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
  45. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  46. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023a.
  47. Llama-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199, 2023b.
  48. Learning to prompt for vision-language models. International Journal of Computer Vision (IJCV), 2022a.
  49. Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022b.
  50. Training on thin air: Improve image classification with generated data. arXiv preprint arXiv:2305.15316, 2023.
  51. A comprehensive survey on transfer learning. Proceedings of the IEEE, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Victor G. Turrisi da Costa (5 papers)
  2. Nicola Dall'Asen (10 papers)
  3. Yiming Wang (141 papers)
  4. Nicu Sebe (270 papers)
  5. Elisa Ricci (137 papers)
Citations (1)