Hybrid Feature Collaborative Reconstruction Network for Few-Shot Fine-Grained Image Classification (2407.02123v1)
Abstract: Our research focuses on few-shot fine-grained image classification, which faces two major challenges: appearance similarity of fine-grained objects and limited number of samples. To preserve the appearance details of images, traditional feature reconstruction networks usually enhance the representation ability of key features by spatial feature reconstruction and minimizing the reconstruction error. However, we find that relying solely on a single type of feature is insufficient for accurately capturing inter-class differences of fine-grained objects in scenarios with limited samples. In contrast, the introduction of channel features provides additional information dimensions, aiding in better understanding and distinguishing the inter-class differences of fine-grained objects. Therefore, in this paper, we design a new Hybrid Feature Collaborative Reconstruction Network (HFCR-Net) for few-shot fine-grained image classification, which includes a Hybrid Feature Fusion Process (HFFP) and a Hybrid Feature Reconstruction Process (HFRP). In HFRP, we fuse the channel features and the spatial features. Through dynamic weight adjustment, we aggregate the spatial dependencies between arbitrary two positions and the correlations between different channels of each image to increase the inter-class differences. Additionally, we introduce the reconstruction of channel dimension in HFRP. Through the collaborative reconstruction of channel dimension and spatial dimension, the inter-class differences are further increased in the process of support-to-query reconstruction, while the intra-class differences are reduced in the process of query-to-support reconstruction. Ultimately, our extensive experiments on three widely used fine-grained datasets demonstrate the effectiveness and superiority of our approach.
- Context-aware attentional pooling (cap) for fine-grained visual classification. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 929–937, 2021.
- Space-time mixing attention for video transformer. Advances in neural information processing systems, 34:19594–19607, 2021.
- A closer look at few-shot classification. arXiv preprint arXiv:1904.04232, 2019.
- Ap-cnn: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification. IEEE Transactions on Image Processing, 30:2826–2836, 2021.
- Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4438–4446, 2017.
- Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3146–3154, 2019.
- Scar: Spatial-/channel-wise attention regression networks for crowd counting. Neurocomputing, 363:1–8, 2019.
- Low-rank pairwise alignment bilinear network for few-shot fine-grained image classification. IEEE Transactions on Multimedia, 23:1666–1680, 2020.
- Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC), volume 2. Citeseer, 2011.
- 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pages 554–561, 2013.
- Task discrepancy maximization for fine-grained few-shot classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5331–5340, 2022.
- Revisiting local descriptor based image-to-class measure for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7260–7268, 2019.
- Bsnet: Bi-similarity network for few-shot fine-grained image classification. IEEE Transactions on Image Processing, 30:1318–1331, 2020.
- Locally-enriched cross-reconstruction for few-shot fine-grained image classification. IEEE Transactions on Circuits and Systems for Video Technology, 33(12):7530–7540, 2023.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Prototypical networks for few-shot learning. Advances in neural information processing systems, 30, 2017.
- Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Matching networks for one shot learning. Advances in neural information processing systems, 29, 2016.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- Simpleshot: Revisiting nearest-neighbor classification for few-shot learning. arXiv preprint arXiv:1911.04623, 2019.
- Pixels, regions, and objects: Multiple enhancement for salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10031–10040, 2023.
- Piecewise classifier mappings: Learning fine-grained learners for novel categories with few examples. IEEE Transactions on Image Processing, 28(12):6116–6125, 2019.
- Few-shot classification with feature map reconstruction networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8012–8021, 2021.
- Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
- Bi-directional feature reconstruction network for fine-grained few-shot image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 2821–2829, 2023.
- Pyramid grafting network for one-stage high resolution saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11717–11726, 2022.
- Comprehensive semi-supervised multi-modal learning. In IJCAI, pages 4092–4098, 2019.
- Domfn: A divergence-orientated multi-modal fusion network for resume assessment. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1612–1620, 2022.
- Alignment efficient image-sentence retrieval considering transferable cross-modal representation learning. Frontiers of Computer Science, 18(1):181335, 2024a.
- Robust semi-supervised learning by wisely leveraging open-set data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024b.
- Not all out-of-distribution data are harmful to open-set active learning. Advances in Neural Information Processing Systems, 36, 2024c.
- Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12203–12213, 2020.
- Spda-cnn: Unifying semantic part detection and abstraction for fine-grained recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1143–1152, 2016.
- Rethinking class relations: Absolute-relative supervised and unsupervised few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9432–9441, 2021.
- Shulei Qiu (1 paper)
- Wanqi Yang (16 papers)
- Ming Yang (289 papers)