Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines (2404.15771v1)

Published 24 Apr 2024 in cs.CV and cs.MM

Abstract: Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a semantic-oriented module. These components serve to magnify objects and identify discriminative regions, respectively. Following G3, we implement a discriminative model training strategy to improve the discriminability and generalization ability of DVF. Extensive analysis and ablation studies confirm the efficacy of our proposed guidelines. Without bells and whistles, the proposed DVF achieves state-of-the-art performance on three widely-used fine-grained datasets in closed-set and open-set settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses. In European Conference on Computer Vision. 548–564.
  2. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets. CoRR abs/1406.2952 (2014).
  3. RandAugment: Practical Automated Data Augmentation with a Reduced Search Space. In Advances in Neural Information Processing Systems.
  4. Multi-view Information Integration and Propagation for occluded person re-identification. Information Fusion 104 (2024), 102201.
  5. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In The International Conference on Learning Representations.
  6. Hyperbolic Vision Transformers: Combining Improvements in Metric Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7399–7409.
  7. Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples. IEEE Trans. on Circuits and Systems for Video Technology (2024).
  8. Dimensionality Reduction by Learning an Invariant Mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1735–1742.
  9. Attribute-Aware Attention Model for Fine-grained Representation Learning. In ACM Multimedia. 2040–2048.
  10. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 595–604.
  11. Delving into Multimodal Prompting for Fine-Grained Visual Classification. In Proceedings of the AAAI conference on artificial intelligence. 2570–2578.
  12. Proxy Anchor Loss for Deep Metric Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3235–3244.
  13. Learning with Memory-based Virtual Classes for Deep Metric Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11772–11781.
  14. 3D Object Representations for Fine-Grained Categorization. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 554–561.
  15. Grounded Language-Image Pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10955–10965.
  16. CTNet: Context-Based Tandem Network for Semantic Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 44, 12 (2022), 9904–9917.
  17. Deep Collaborative Embedding for Social Image Understanding. IEEE Trans. on Pattern Analysis and Machine Intelligence 41, 9 (2019), 2070–2083.
  18. Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 212–222.
  19. Multi-Level Region Matching for Fine-Grained Sketch-Based Image Retrieval. In ACM Multimedia. 462–470.
  20. DAS: Densely-Anchored Sampling for Deep Metric Learning. In European Conference on Computer Vision. 399–417.
  21. Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. CoRR abs/2303.05499 (2023).
  22. Keypoint-aligned embeddings for image retrieval and re-identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 676–685.
  23. Keypoint-Aligned Embeddings for Image Retrieval and Re-identification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 676–685.
  24. No Fuss Distance Metric Learning Using Proxies. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 360–368.
  25. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211–252.
  26. Learning Intra-Batch Connections for Deep Metric Learning. In International Conference on Machine Learning. 9410–9421.
  27. Learning Feature Embedding with Strong Neural Activations for Fine-Grained Retrieval. In Proceedings of the on Thematic Workshops of ACM Multimedia. 424–432.
  28. Deep Metric Learning via Lifted Structured Feature Embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4004–4012.
  29. SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization. In ACM Multimedia. 5853–5861.
  30. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition 130 (2022), 108792.
  31. ProxyNCA++: Revisiting and Revitalizing Proxy Neighborhood Component Analysis. In European Conference on Computer Vision. 448–464.
  32. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. 10347–10357.
  33. DeiT III: Revenge of the ViT. In European Conference on Computer Vision. 516–533.
  34. Deep Factorized Metric Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7672–7682.
  35. Fine-Grained Retrieval Prompt Tuning. In Proceedings of the AAAI conference on artificial intelligence. 2644–2652.
  36. Category-Specific Nuance Exploration Network for Fine-Grained Object Retrieval. In Proceedings of the AAAI conference on artificial intelligence. 2513–2521.
  37. From coarse to fine: multi-level feature fusion network for fine-grained image retrieval. Multimedia Systems 28, 4 (2022), 1515–1528.
  38. Weakly Supervised Fine-grained Image Classification via Correlation-guided Discriminative Learning. In ACM Multimedia. 1851–1860.
  39. Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval. IEEE Trans. on Image Processing 26, 6 (2017), 2868–2881.
  40. A$^2$-Net: Learning Attribute-Aware Hash Codes for Large-Scale Fine-Grained Image Retrieval. In Advances in Neural Information Processing Systems. 5720–5730.
  41. Sub-Region Localized Hashing for Fine-Grained Image Retrieval. IEEE Trans. on Image Processing 31 (2022), 314–326.
  42. CLIP-Driven Fine-Grained Text-Image Person Re-Identification. IEEE Trans. on Image Processing 32 (2023), 6032–6046.
  43. Open-Vocabulary DETR with Conditional Matching. In European Conference on Computer Vision. 106–122.
  44. Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval. In International Joint Conference on Artificial Intelligence. 1226–1233.
  45. Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer. In Proceedings of the AAAI conference on artificial intelligence. 9291–9298.
  46. Fast context adaptation via meta-learning. In International Conference on Machine Learning. 7693–7702.

Summary

We haven't generated a summary for this paper yet.