Class-relevant Patch Embedding Selection for Few-Shot Image Classification (2405.03722v1)
Abstract: Effective image classification hinges on discerning relevant features from both foreground and background elements, with the foreground typically holding the critical information. While humans adeptly classify images with limited exposure, artificial neural networks often struggle with feature selection from rare samples. To address this challenge, we propose a novel method for selecting class-relevant patch embeddings. Our approach involves splitting support and query images into patches, encoding them using a pre-trained Vision Transformer (ViT) to obtain class embeddings and patch embeddings, respectively. Subsequently, we filter patch embeddings using class embeddings to retain only the class-relevant ones. For each image, we calculate the similarity between class embedding and each patch embedding, sort the similarity sequence in descending order, and only retain top-ranked patch embeddings. By prioritizing similarity between the class embedding and patch embeddings, we select top-ranked patch embeddings to be fused with class embedding to form a comprehensive image representation, enhancing pattern recognition across instances. Our strategy effectively mitigates the impact of class-irrelevant patch embeddings, yielding improved performance in pre-trained models. Extensive experiments on popular few-shot classification benchmarks demonstrate the simplicity, efficacy, and computational efficiency of our approach, outperforming state-of-the-art baselines under both 5-shot and 1-shot scenarios.
- G. Koch, R. Zemel, R. Salakhutdinov et al., “Siamese neural networks for one-shot image recognition,” in ICML Deep Learning Workshop, 2015.
- B. M. Lake, R. Salakhutdinov, J. Gross, and J. B. Tenenbaum, “One shot learning of simple visual concepts,” in Proceedings of the 33th Annual Meeting of the Cognitive Science Society, 2011.
- P. Tian and S. Xie, “An adversarial meta-training framework for cross-domain few-shot learning,” IEEE Transactions on Multimedia, vol. 25, pp. 6881–6891, 2023.
- Y. Zhu, W. Min, and S. Jiang, “Attribute-guided feature learning for few-shot image recognition,” IEEE Transactions on Multimedia, vol. 23, pp. 1200–1209, 2021.
- J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” in Advances in Neural Information Processing Systems, 2017.
- F. Peng, X. Yang, L. Xiao, Y. Wang, and C. Xu, “Sgva-clip: Semantic-guided visual adapting of vision-language models for few-shot image classification,” IEEE Transactions on Multimedia, vol. 26, pp. 3469–3480, 2024.
- X. Zhong, C. Gu, M. Ye, W. Huang, and C.-W. Lin, “Graph complemented latent representation for few-shot image classification,” IEEE Transactions on Multimedia, vol. 25, pp. 1979–1990, 2023.
- F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. S. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, “Meta-baseline: Exploring simple meta-learning for few-shot learning,” in IEEE/CVF International Conference on Computer Vision, 2021.
- F. Hao, F. He, J. Cheng, L. Wang, J. Cao, and D. Tao, “Collect and select: Semantic alignment metric learning for few-shot learning,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 2019, pp. 8459–8468.
- R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen, “Cross attention network for few-shot classification,” in Advances in Neural Information Processing Systems, December 8-14, 2019, Vancouver, BC, Canada, H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, Eds., 2019, pp. 4005–4016.
- C. Doersch, A. Gupta, and A. Zisserman, “Crosstransformers: spatially-aware few-shot transfer,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., 2020.
- C. Zhang, Y. Cai, G. Lin, and C. Shen, “Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- M. Hiller, R. Ma, M. Harandi, and T. Drummond, “Rethinking generalization in few-shot classification,” in Advances in Neural Information Processing Systems, 2022.
- F. Hao, F. He, L. Liu, F. Wu, D. Tao, and J. Cheng, “Class-aware patch embedding adaptation for few-shot image classification,” in IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023. IEEE, 2023, pp. 18 859–18 869.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning, 2017.
- A. Raghu, M. Raghu, S. Bengio, and O. Vinyals, “Rapid learning or feature reuse? towards understanding the effectiveness of MAML,” in 8th International Conference on Learning Representations, 2020.
- J. Oh, H. Yoo, C. Kim, and S. Yun, “BOIL: towards representation change for few-shot learning,” in 9th International Conference on Learning Representations, 2021.
- A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell, “Meta-learning with latent embedding optimization,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- O. Vinyals, C. Blundell, T. Lillicrap, k. kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Advances in Neural Information Processing Systems, 2016.
- W. Jiang, K. Huang, J. Geng, and X. Deng, “Multi-scale metric learning for few-shot learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1091–1102, 2021.
- H. Li, D. Eigen, S. Dodge, M. Zeiler, and X. Wang, “Finding task-relevant features for few-shot learning by category traversal,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019, pp. 1–10.
- C. Dong, W. Li, J. Huo, Z. Gu, and Y. Gao, “Learning task-aware local representations for few-shot learning,” in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020.
- D. Kang, H. Kwon, J. Min, and M. Cho, “Relational embedding for few-shot classification,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 8802–8813.
- S. X. Hu, D. Li, J. Stühmer, M. Kim, and T. M. Hospedales, “Pushing the limits of simple pipelines for few-shot learning: External data and fine-tuning make a difference,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Y. He, W. Liang, D. Zhao, H. Zhou, W. Ge, Y. Yu, and W. Zhang, “Attribute surrogates learning and spectral tokens pooling in transformers for few-shot learning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 9109–9119.
- M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 9630–9640.
- S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- W. Chen, Y. Liu, Z. Kira, Y. F. Wang, and J. Huang, “A closer look at few-shot classification,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Rethinking few-shot image classification: A good embedding is all you need?” in Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIV. Springer, 2020, pp. 266–282.
- W. Jiang, W. Zhou, and H. Hu, “Double-stream position learning transformer network for image captioning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7706–7718, 2022.
- B. Tang, Z. Liu, Y. Tan, and Q. He, “Hrtransnet: Hrformer-driven two-modality salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 2, pp. 728–742, 2023.
- H. Bao, L. Dong, S. Piao, and F. Wei, “Beit: BERT pre-training of image transformers,” in The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Z. Xie, Z. Zhang, Y. Cao, Y. Lin, J. Bao, Z. Yao, Q. Dai, and H. Hu, “Simmim: a simple framework for masked image modeling,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 9643–9653.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. B. Girshick, “Masked autoencoders are scalable vision learners,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 15 979–15 988.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- J. Cheng, F. Hao, F. He, L. Liu, and Q. Zhang, “Mixer-based semantic spread for few-shot learning,” IEEE Transactions on Multimedia, vol. 25, pp. 191–202, 2023. [Online]. Available: https://doi.org/10.1109/TMM.2021.3123813
- S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in 5th International Conference on Learning Representations, 2017.
- M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel, “Meta-learning for semi-supervised few-shot classification,” in 6th International Conference on Learning Representations, 2018.
- L. Bertinetto, J. F. Henriques, P. H. S. Torr, and A. Vedaldi, “Meta-learning with differentiable closed-form solvers,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- B. N. Oreshkin, P. R. López, and A. Lacoste, “TADAM: task dependent adaptive metric for improved few-shot learning,” in Advances in Neural Information Processing Systems, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 719–729.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems, 2012.
- H. Ye, H. Hu, D. Zhan, and F. Sha, “Few-shot learning via embedding adaptation with set-to-set functions,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
- M. Zhang, J. Zhang, Z. Lu, T. Xiang, M. Ding, and S. Huang, “IEPT: instance-level and episode-level pretext tasks for few-shot learning,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- N. Fei, Z. Lu, T. Xiang, and S. Huang, “MELR: meta-learning via modeling episode-level relationships for few-shot learning,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- D. Wertheimer, L. Tang, and B. Hariharan, “Few-shot classification with feature map reconstruction networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- J. Zhao, Y. Yang, X. Lin, J. Yang, and L. He, “Looking wider for better adaptive representation in few-shot learning,” in AAAI Conference on Artificial Intelligence, 2021.
- C. Xu, Y. Fu, C. Liu, C. Wang, J. Li, F. Huang, L. Zhang, and X. Xue, “Learning dynamic alignment via meta-filter for few-shot learning,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 2021, pp. 5182–5191.
- C. Liu, Y. Fu, C. Xu, S. Yang, J. Li, C. Wang, and L. Zhang, “Learning a few-shot embedding model with contrastive learning,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021. AAAI Press, 2021, pp. 8635–8643.
- Z. Zhou, X. Qiu, J. Xie, J. Wu, and C. Zhang, “Binocular mutual learning for improving few-shot classification,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 8382–8391.
- C. Zhang, H. Ding, G. Lin, R. Li, C. Wang, and C. Shen, “Meta navigator: Search for a good adaptation policy for few-shot learning,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 9415–9424.
- J. Ma, H. Xie, G. Han, S. Chang, A. Galstyan, and W. Abd-Almageed, “Partner-assisted learning for few-shot image classification,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 10 553–10 562.
- X. Luo, L. Wei, L. Wen, J. Yang, L. Xie, Z. Xu, and Q. Tian, “Rectifying the shortcut learning of background for few-shot learning,” in Advances in Neural Information Processing Systems, December 6-14, 2021, virtual, M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, pp. 13 073–13 085.
- J. Xie, F. Long, J. Lv, Q. Wang, and P. Li, “Joint distribution matters: Deep brownian distance covariance for few-shot classification,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 7962–7971.
- X. Wang, X. Wang, B. Jiang, and B. Luo, “Few-shot learning meets transformer: Unified query-support transformers for few-shot classification,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 12, pp. 7789–7802, 2023.
- S. Gidaris, A. Bursuc, N. Komodakis, P. Pérez, and M. Cord, “Boosting few-shot visual learning with self-supervision,” in 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019. IEEE, 2019, pp. 8058–8067.
- X. Zhang, D. Meng, H. Gouk, and T. M. Hospedales, “Shallow bayesian meta learning for real-world few-shot recognition,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 631–640.
- G. Qi, H. Yu, Z. Lu, and S. Li, “Transductive few-shot classification on the oblique manifold,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 8392–8402.
- K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019, pp. 10 657–10 665.
- J. Kim, H. Kim, and G. Kim, “Model-agnostic boundary-adversarial sampling for test-time generalization in few-shot learning,” in Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I. Springer, 2020, pp. 599–617.
- J. Wu, T. Zhang, Y. Zhang, and F. Wu, “Task-aware part mining network for few-shot learning,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 8413–8422.
- A. Afrasiyabi, J. Lalonde, and C. Gagné, “Mixture-based feature space learning for few-shot image classification,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 9021–9031.
- Z. Chen, J. Ge, H. Zhan, S. Huang, and D. Wang, “Pareto self-supervised training for few-shot learning,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 2021, pp. 13 663–13 672.