Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models? (2307.04114v1)

Published 9 Jul 2023 in cs.LG, cs.AI, cs.CL, cs.CV, and cs.MM

Abstract: Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained LLMs based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained LLM, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  2. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  3. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  4. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  5. Cross-generalization: Learning novel classes from a single example by feature replacement. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 672–679. IEEE, 2005.
  6. Michael Fink. Object classification from a single example utilizing class relevance metrics. Advances in neural information processing systems, 17, 2004.
  7. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28(4):594–611, 2006.
  8. One shot learning of simple visual concepts. In Proceedings of the annual meeting of the cognitive science society, volume 33, 2011.
  9. Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
  10. Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
  11. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, 2017.
  12. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
  13. Zhang et al. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In ICCV, 2020.
  14. Adaptive cross-modal few-shot learning. Advances in Neural Information Processing Systems, 32, 2019.
  15. Few-shot image recognition with knowledge transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 441–449, 2019.
  16. Boosting few-shot learning with adaptive margin loss. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12576–12584, 2020a.
  17. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  18. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  19. Improving language understanding by generative pre-training. 2018.
  20. Visual-semantic contrastive alignment for few-shot image classification. arXiv preprint arXiv:2210.11000, 2022a.
  21. Semantic prompt for few-shot image recognition. arXiv preprint arXiv:2303.14123, 2023.
  22. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  23. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
  24. Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In International conference on machine learning, pages 7115–7123. PMLR, 2019.
  25. Finding Task-Relevant Features for Few-Shot Learning by Category Traversal. In CVPR, 2019.
  26. Transductive episodic-wise adaptive metric for few-shot learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3603–3612, 2019.
  27. Few-shot learning via embedding adaptation with set-to-set functions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8808–8817, 2020.
  28. Adaptive subspaces for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4136–4145, 2020.
  29. Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960, 2018.
  30. Meta-transfer learning for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 403–412, 2019.
  31. Meta-learning with differentiable convex optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10657–10665, 2019.
  32. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  33. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  34. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  35. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020a.
  36. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020b.
  37. Contrastive unsupervised word alignment with non-local features. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
  38. Large margin neural language model. arXiv preprint arXiv:1808.08987, 2018.
  39. Contrastive learning with adversarial perturbations for conditional text generation. arXiv preprint arXiv:2012.07280, 2020a.
  40. Contrastive learning of medical visual representations from paired images and text. In Machine Learning for Healthcare Conference, pages 2–25. PMLR, 2022.
  41. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
  42. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  43. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
  44. Meta-learning for semi-supervised few-shot classification. In ICLR, 2018.
  45. Learning multiple layers of features from tiny images. 2009.
  46. Meta-learning with differentiable closed-form solvers. arXiv preprint arXiv:1805.08136, 2018.
  47. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  48. Tadam: Task dependent adaptive metric for improved few-shot learning. Advances in neural information processing systems, 31, 2018.
  49. Binocular mutual learning for improving few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8402–8411, 2021.
  50. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  51. Joint distribution matters: Deep brownian distance covariance for few-shot classification. In CVPR, 2022.
  52. A closer look at few-shot classification. In International Conference on Learning Representations, 2019.
  53. Meta-baseline: exploring simple meta-learning for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9062–9071, 2021.
  54. Few-shot learning with localization in realistic settings. In CVPR, 2019.
  55. Rethinking few-shot image classification: a good embedding is all you need? In Computer Vision–ECCV 2020: 16th European Conference. Springer, 2020.
  56. Relational embedding for few-shot classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8822–8833, 2021.
  57. Cross attention network for few-shot classification. Advances in Neural Information Processing Systems, 32, 2019.
  58. Task-adaptive negative envision for few-shot open-set recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7171–7180, 2022.
  59. Task-aware part mining network for few-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8433–8442, 2021.
  60. Asymmetric distribution measure for few-shot learning. IJCAI, 2020b.
  61. Visual-semantic contrastive alignment for few-shot image classification. arXiv preprint arXiv:2210.11000, 2022b.
  62. How to train your maml to excel in few-shot classification. arXiv preprint arXiv:2106.16245, 2021.
  63. Conditional self-supervised learning for few-shot classification. In International Joint Conference on Artificial Intelligence, IJCAI, 2021.
  64. Self-supervised label augmentation via input transformations. In International Conference on Machine Learning,ICML, 2020b.
  65. Associative alignment for few-shot image classification. In ECCV, 2020.
  66. Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. CoRR, abs/1911.04623, 2019.
  67. Visualizing data using t-sne. Journal of Machine Learning Research, 2008.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zihao Jiang (12 papers)
  2. Yunkai Dang (5 papers)
  3. Dong Pang (1 paper)
  4. Huishuai Zhang (64 papers)
  5. Weiran Huang (54 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.