Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition (2401.07061v2)
Abstract: Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework.
- Z. Guo, J. Zhao, L. Jiao, X. Liu, and F. Liu, “A universal quaternion hypergraph network for multimodal video question answering,” IEEE Transactions on Multimedia, vol. 25, pp. 38–49, 2023.
- C. Sun, H. Song, X. Wu, Y. Jia, and J. Luo, “Exploiting informative video segments for temporal action localization,” IEEE Transactions on Multimedia, vol. 24, pp. 274–287, 2022.
- T. Chen, T. Pu, H. Wu, Y. Xie, L. Liu, and L. Lin, “Cross-domain facial expression recognition: A unified evaluation benchmark and adversarial graph learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9887–9903, 2022.
- M. Cui, W. Wang, K. Zhang, Z. Sun, and L. Wang, “Pose-appearance relational modeling for video action recognition,” IEEE Transactions on Image Processing, vol. 32, pp. 295–308, 2023.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- X. Wang, X. Wang, B. Jiang, and B. Luo, “Few-shot learning meets transformer: Unified query-support transformers for few-shot classification,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–14, 2023.
- X. Yang, M. Han, Y. Luo, H. Hu, and Y. Wen, “Two-stream prototype learning network for few-shot face recognition under occlusions,” IEEE Transactions on Multimedia, vol. 25, pp. 1555–1563, 2023.
- T. Chen, L. Lin, R. Chen, X. Hui, and H. Wu, “Knowledge-guided multi-label few-shot learning for general image recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1371–1384, 2022.
- Y. Zhang, S. Huang, and F. Zhou, “Generally boosting few-shot learning with handcrafted features,” in ACM Conference on Multimedia, 2021, pp. 3143–3152.
- K. Li, Y. Zhang, K. Li, and Y. Fu, “Adversarial feature hallucination networks for few-shot learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13 467–13 476.
- O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2016, pp. 3630–3638.
- C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of International Conference on Machine Learning (ICML), 2017, pp. 1126–1135.
- F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. S. Torr, and T. M. Hospedales, “Learning to compare: Relation network for few-shot learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1199–1208.
- N. Lai, M. Kan, C. Han, X. Song, and S. Shan, “Learning to learn adaptive classifier-predictor for few-shot learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3458–3470, 2021.
- Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, “Meta-baseline: Exploring simple meta-learning for few-shot learning,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2021, pp. 9062–9071.
- R. Zhang, T. Che, Z. Ghahramani, Y. Bengio, and Y. Song, “MetaGAN: An adversarial approach to few-shot learning,” in Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2018, pp. 2371–2380.
- E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. S. Feris, R. Giryes, and A. M. Bronstein, “Delta-encoder: an effective sample synthesis method for few-shot object recognition,” in Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2018, pp. 2850–2860.
- S. Yang, L. Liu, and M. Xu, “Free lunch for few-shot learning: Distribution calibration,” in Proceedings of International Conference on Learning Representations (ICLR), 2021.
- G. A. Miller, “Wordnet: An on-line lexical database,” International Journal of Lexicography, vol. 3, 1990.
- V. D. M. Laurens and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 2605, pp. 2579–2605, 2008.
- C. Zhang, Y. Cai, G. Lin, and C. Shen, “Deepemd: Differentiable earth mover’s distance for few-shot learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 5632–5648, 2023.
- J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” in Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2017, pp. 4077–4087.
- H. Ye, H. Hu, D. Zhan, and F. Sha, “Few-shot learning via embedding adaptation with set-to-set functions,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8805–8814.
- A. Afrasiyabi, H. Larochelle, J. Lalonde, and C. Gagné, “Matching feature sets for few-shot image classification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Y. Zhou, J. Hao, S. Huo, B. Wang, L. Ge, and S.-Y. Kung, “Automatic metric search for few-shot learning,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2023.
- J. Cheng, F. Hao, F. He, L. Liu, and Q. Zhang, “Mixer-based semantic spread for few-shot learning,” IEEE Transactions on Multimedia, vol. 25, pp. 191–202, 2023.
- D. Das and C. S. G. Lee, “A two-stage approach to few-shot learning for image recognition,” IEEE Transactions on Image Processing, vol. 29, pp. 3336–3350, 2020.
- T. Munkhdalai and H. Yu, “Meta networks,” in Proceedings of International Conference on Machine Learning (ICML), 2017, pp. 2554–2563.
- K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10 657–10 665.
- B. Liu, Y. Cao, Y. Lin, Q. Li, Z. Zhang, M. Long, and H. Hu, “Negative margin matters: Understanding margin in few-shot classification,” in Proceedings of European Conference on Computer Vision (ECCV), 2020, pp. 438–455.
- A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell, “Meta-learning with latent embedding optimization,” in Proceedings of International Conference on Learning Representations (ICLR), 2019.
- P. Mangla, N. Kumari, A. Sinha, M. Singh, B. Krishnamurthy, and V. N. Balasubramanian, “Charting the right manifold: Manifold mixup for few-shot learning,” in IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 2218–2227.
- J. Oh, H. Yoo, C. Kim, and S. Yun, “BOIL: towards representation change for few-shot learning,” in Proceedings of International Conference on Learning Representations (ICLR), 2021.
- Q. Bouniot, I. Redko, R. Audigier, A. Loesch, and A. Habrard, “Improving few-shot learning through multi-task representation learning theory,” in Proceedings of European Conference on Computer Vision (ECCV), 2022, pp. 435–452.
- B. Zhang, X. Li, Y. Ye, and S. Feng, “Prototype completion for few-shot learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 12 250–12 268, 2023.
- N. Fei, Z. Lu, T. Xiang, and S. Huang, “MELR: meta-learning via modeling episode-level relationships for few-shot learning,” in Proceedings of International Conference on Learning Representations (ICLR), 2021.
- S. Qiao, C. Liu, W. Shen, and A. L. Yuille, “Few-shot image recognition by predicting parameters from activations,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7229–7238.
- S. Gidaris and N. Komodakis, “Generating classification weights with GNN denoising autoencoders for few-shot learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 21–30.
- Y. Liu, B. Schiele, and Q. Sun, “An ensemble of epoch-wise empirical bayes for few-shot learning,” in Proceedings of European Conference on Computer Vision (ECCV), 2020, pp. 404–421.
- A. Afrasiyabi, J. Lalonde, and C. Gagné, “Mixture-based feature space learning for few-shot image classification,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2021, pp. 9021–9031.
- T. Yu, S. He, Y. Song, and T. Xiang, “Hybrid graph neural networks for few-shot learning,” in Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2022.
- B. Hariharan and R. B. Girshick, “Low-shot visual recognition by shrinking and hallucinating features,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3037–3046.
- C. Xing, N. Rostamzadeh, B. N. Oreshkin, and P. O. Pinheiro, “Adaptive cross-modal few-shot learning,” in Proceedings of Annual Conference on Neural Information Processing Systems (NeurIPS), 2019, pp. 4848–4858.
- Z. Chen, Y. Fu, Y. Zhang, Y. Jiang, X. Xue, and L. Sigal, “Multi-level semantic feature augmentation for one-shot learning,” IEEE Transactions on Image Processing, vol. 28, no. 9, pp. 4594–4605, 2019.
- H. Zhang, J. Zhang, and P. Koniusz, “Few-shot learning via saliency-guided hallucination of samples,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2770–2779.
- Y. Yin, Y. Zhang, Z. Liu, S. Wang, R. R. Shah, and R. Zimmermann, “Gps2vec: Pre-trained semantic embeddings for worldwide GPS coordinates,” IEEE Transactions on Multimedia, vol. 24, pp. 890–903, 2022.
- Y. Tang, J. Wang, X. Wang, B. Gao, E. Dellandréa, R. J. Gaizauskas, and L. Chen, “Visual and semantic knowledge transfer for large scale semi-supervised object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 3045–3058, 2018.
- T. Chen, M. Xu, X. Hui, H. Wu, and L. Lin, “Learning semantic-specific graph representation for multi-label image recognition,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2019, pp. 522–531.
- Y. Xie, R. Wei, J. Song, Y. Liu, Y. Wang, and K. Zhou, “Label-affinity self-adaptive central similarity hashing for image retrieval,” IEEE Transactions on Multimedia, vol. 25, pp. 9161–9174, 2023.
- W. Yang, X. Wang, A. Farhadi, A. Gupta, and R. Mottaghi, “Visual semantic navigation using scene priors,” in Proceedings of International Conference on Learning Representations (ICLR), 2019.
- X. Wang, Y. Ye, and A. Gupta, “Zero-shot recognition via semantic embeddings and knowledge graphs,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6857–6866.
- Y. Xian, T. Lorenz, B. Schiele, and Z. Akata, “Feature generating networks for zero-shot learning,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5542–5551.
- C. Lin, Y. F. Wang, C. Lei, and K. Chen, “Semantics-guided data hallucination for few-shot visual classification,” in IEEE International Conference on Image Processing, 2019, pp. 3302–3306.
- F. Yang, R. Wang, and X. Chen, “SEGA: semantic guided attention on visual prototype for few-shot learning,” in IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 1586–1596.
- H. Chen, H. Li, Y. Li, and C. Chen, “Shaping visual representations with attributes for few-shot recognition,” IEEE Signal Processing Letters, vol. 29, pp. 1397–1401, 2022.
- S. Huang, M. Zhang, Y. Kang, and D. Wang, “Attributes-guided and pure-visual attention alignment for few-shot recognition,” in Proceedings of AAAI Conference on Artificial Intelligence (AAAI), 2021, pp. 7840–7847.
- V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter,” ArXiv, vol. abs/1910.01108, 2019.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626.
- M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel, “Meta-learning for semi-supervised few-shot classification,” in Proceedings of International Conference on Learning Representations (ICLR), 2018.
- P. Welinder, S. Branson, T. Mita, C. Wah, and P. Perona, “Caltech-ucsd birds 200,” california institute of technology, 2010.
- J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255.
- S. Zagoruyko and N. Komodakis, “Wide residual networks,” in Proceedings of the British Machine Vision Conference (BMVC), 2016.
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. VanderPlas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
- H. Tseng, H. Lee, J. Huang, and M. Yang, “Cross-domain few-shot classification via learned feature-wise transformation,” in Proceedings of International Conference on Learning Representations (ICLR), 2020.
- H. Wang and Z. Deng, “Cross-domain few-shot classification via adversarial task augmentation,” in Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), 2021, pp. 1075–1081.
- E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, U. Evci, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P.-A. Manzagol, et al., “Meta-dataset: A dataset of datasets for learning to learn from few examples,” arXiv preprint arXiv:1903.03096, 2019.
- T. Saikia, T. Brox, and C. Schmid, “Optimized generic feature learning for few-shot classification across domains,” arXiv preprint arXiv:2001.07926, 2020.
- P. Bateni, R. Goyal, V. Masrani, F. Wood, and L. Sigal, “Improved few-shot visual classification,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 14 493–14 502.
- P. Bateni, J. Barber, J.-W. Van de Meent, and F. Wood, “Enhancing few-shot image classification with unlabelled examples,” in IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 2796–2805.
- T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Proceedings of European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
- M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 3606–3613.
- B. Schroeder and Y. Cui, “FGVCx fungi classification challenge,” 2018.
- B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338, 2015.
- B. Wang, L. Li, M. Verma, Y. Nakashima, R. Kawasaki, and H. Nagahara, “Mtunet: Few-shot image classification with visual explanations,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2294–2298.