FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models? (2307.04114v1)
Abstract: Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained LLMs based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained LLM, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Cross-generalization: Learning novel classes from a single example by feature replacement. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 672–679. IEEE, 2005.
- Michael Fink. Object classification from a single example utilizing class relevance metrics. Advances in neural information processing systems, 17, 2004.
- One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28(4):594–611, 2006.
- One shot learning of simple visual concepts. In Proceedings of the annual meeting of the cognitive science society, volume 33, 2011.
- Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
- Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
- Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
- Zhang et al. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In ICCV, 2020.
- Adaptive cross-modal few-shot learning. Advances in Neural Information Processing Systems, 32, 2019.
- Few-shot image recognition with knowledge transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 441–449, 2019.
- Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12576–12584, 2020a.
- Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Improving language understanding by generative pre-training. 2018.
- Visual-semantic contrastive alignment for few-shot image classification. arXiv preprint arXiv:2210.11000, 2022a.
- Semantic prompt for few-shot image recognition. arXiv preprint arXiv:2303.14123, 2023.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
- Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In International conference on machine learning, pages 7115–7123. PMLR, 2019.
- Finding Task-Relevant Features for Few-Shot Learning by Category Traversal. In CVPR, 2019.
- Transductive episodic-wise adaptive metric for few-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3603–3612, 2019.
- Few-shot learning via embedding adaptation with set-to-set functions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8808–8817, 2020.
- Adaptive subspaces for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4136–4145, 2020.
- Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018.
- Meta-transfer learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 403–412, 2019.
- Meta-learning with differentiable convex optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10657–10665, 2019.
- Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020a.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020b.
- Contrastive unsupervised word alignment with non-local features. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
- Large margin neural language model. arXiv preprint arXiv:1808.08987, 2018.
- Contrastive learning with adversarial perturbations for conditional text generation. arXiv preprint arXiv:2012.07280, 2020a.
- Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pages 2–25. PMLR, 2022.
- Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
- Meta-learning for semi-supervised few-shot classification. In ICLR, 2018.
- Learning multiple layers of features from tiny images. 2009.
- Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136, 2018.
- The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
- Tadam: Task dependent adaptive metric for improved few-shot learning. Advances in neural information processing systems, 31, 2018.
- Binocular mutual learning for improving few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8402–8411, 2021.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Joint distribution matters: Deep brownian distance covariance for few-shot classification. In CVPR, 2022.
- A closer look at few-shot classification. In International Conference on Learning Representations, 2019.
- Meta-baseline: exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9062–9071, 2021.
- Few-shot learning with localization in realistic settings. In CVPR, 2019.
- Rethinking few-shot image classification: a good embedding is all you need? In Computer Vision–ECCV 2020: 16th European Conference. Springer, 2020.
- Relational embedding for few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8822–8833, 2021.
- Cross attention network for few-shot classification. Advances in Neural Information Processing Systems, 32, 2019.
- Task-adaptive negative envision for few-shot open-set recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7171–7180, 2022.
- Task-aware part mining network for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8433–8442, 2021.
- Asymmetric distribution measure for few-shot learning. IJCAI, 2020b.
- Visual-semantic contrastive alignment for few-shot image classification. arXiv preprint arXiv:2210.11000, 2022b.
- How to train your maml to excel in few-shot classification. arXiv preprint arXiv:2106.16245, 2021.
- Conditional self-supervised learning for few-shot classification. In International Joint Conference on Artificial Intelligence, IJCAI, 2021.
- Self-supervised label augmentation via input transformations. In International Conference on Machine Learning,ICML, 2020b.
- Associative alignment for few-shot image classification. In ECCV, 2020.
- Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. CoRR, abs/1911.04623, 2019.
- Visualizing data using t-sne. Journal of Machine Learning Research, 2008.
- Zihao Jiang (12 papers)
- Yunkai Dang (5 papers)
- Dong Pang (1 paper)
- Huishuai Zhang (64 papers)
- Weiran Huang (54 papers)