On Train-Test Class Overlap and Detection for Image Retrieval (2404.01524v1)
Abstract: How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings are striking. Not only is there a dramatic drop in performance, but it is inconsistent across methods, changing the ranking.What does it take to focus on objects or interest and ignore background clutter when indexing? Do we need to train an object detector and the representation separately? Do we need location supervision? We introduce Single-stage Detect-to-Retrieve (CiDeR), an end-to-end, single-stage pipeline to detect objects of interest and extract a global image representation. We outperform previous state-of-the-art on both existing training sets and the new RGLDv2-clean. Our dataset is available at https://github.com/dealicious-inc/RGLDv2-clean.
- Aggregating Local Deep Features for Image Retrieval. In ICCV, 2015.
- Neural codes for image retrieval. In ECCV, 2014.
- Unifying deep local and global features for image search. In ECCV, 2020.
- Efficient object embedding for spliced image retrieval. In CVPR, 2021.
- Rethinking atrous convolution for semantic image segmentation. In arXiv preprint arXiv:1706.05587, 2017.
- ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In CVPR, 2019.
- Superpoint: Self-supervised interest point detection and description. In CVPRW, 2018.
- Deep image retrieval: Learning global representations for image search. In ECCV, 2016.
- Attention-aware generalized mean pooling for image retrieval. In arXiv preprint arXiv:1811.00202, 2018.
- What is the best practice for cnns applied to visual instance retrieval? In ICLR, 2017.
- Bag of tricks for image classification with convolutional neural networks. In CVPR, 2018.
- Gather-excite: Exploiting feature context in convolutional neural networks. In NeurIPS, 2018.
- Squeeze-and-Excitation Networks. In CVPR, 2018.
- Aggregating local image descriptors into compact codes. In PAMI, 2011.
- Cross-dimensional weighting for aggregated deep convolutional features. In ECCV, 2016.
- A detect-then-retrieve model for multi-domain fashion item retrieval. In CVPRW, 2019.
- The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. In International Journal of Computer Vision, 2020.
- Which is plagiarism: Fashion image retrieval based on regional representation for design protection. In CVPR, 2020.
- Set transformer: A framework for attention-based permutation-invariant neural networks. In ICML, 2019.
- Correlation verification for image retrieval. In CVPR, 2022.
- Selective kernel networks. In CVPR, 2019.
- Bow image retrieval method based on ssd target detection. In IET Image Processing, 2020.
- D. Lowe. Distinctive image features from scale-invariant keypoints. In IJCV, 2004.
- TorchVision maintainers and contributors. Torchvision: Pytorch’s computer vision library. https://github.com/pytorch/vision, 2016.
- Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. In arXiv preprint arXiv:2110.02178, 2021.
- Instance-level object retrieval via deep region cnn. In Multimedia Tools and Applications, 2019.
- SOLAR: Second-Order Loss and Attention for Image Retrieval. In ECCV, 2020.
- Large-scale image retrieval with attentive deep local features. In ICCV, 2017.
- PyTorch: An imperative style, high-performance deep learning. In NeurIPS, 2019.
- Leaf disease image retrieval with object detection and deep metric learning. In Frontiers in Plant Science, 2022.
- Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.
- Lost in quantization:Improving particular object retrieval in large scale image databases. In CVPR, 2008.
- Keep it simpool: Who said supervised transformers suffer from attention deficit? In ICCV, 2023.
- Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking. In CVPR, 2018.
- CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In ECCV, 2016.
- Fine-tuning cnn image retrieval with no human annotation. In TPAMI, 2019.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Visual instance retrieval with deep convolutional networks. In CoRR, 2015.
- Object level deep feature pooling for compact image representation. In CVPRW, 2015.
- You only look once: Unified, real-time object detection. In CVPR, 2016.
- Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015.
- ImageNet Large Scale Visual Recognition Challenge. In International booktitle of Computer Vision, 2015.
- Faster r-cnn features for instance search. In CVPRW, 2016.
- Local Features and Visual Words Emerge in Activations. In CVPR, 2019.
- Graph-based particular object discovery. Machine Vision and Applications, 30(2):243–254, 2019.
- All the attention you need: Global-local, spatial-channel attention for image retrieval. In WACV, 2022.
- Boosting vision transformers for image retrieval. In WACV, 2023.
- Detect-to-retrieve: Efficient regional aggregation for image search. In CVPR, 2019.
- To aggregate or not to aggregate: Selective match kernels for image search. In ICCV, 2013.
- Learning and aggregating deep local descriptors for instance-level recognition. In ECCV, 2020.
- Particular object retrieval with integral max-pooling of CNN activations. In ICLR, 2016.
- Augmenting convolutional networks with attention-based aggregation. In arXiv preprint arXiv:2112.13692, 2021.
- ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In CVPR, 2020.
- Non-local Neural Networks. In CVPR, 2018.
- Learning Super-Features for Image Retrieval. In ICLR, 2022.
- Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval. In CVPR, 2020.
- CBAM: Convolutional Block Attention Module. In ECCV, 2018.
- Learning token-based representation for image retrieval. 2022.
- Dolg: Single-stage image retrieval with deep orthogonal fusion of local and global features. In ICCV, 2021.
- Two-stage discriminative re-ranking for large-scale landmark retrieval. In CVPRW, 2020.
- Scaling vision transformers. In CVPR, 2022.
- Dataset-driven unsupervised object discovery for region-based instance image retrieval. In TPAMI, 2023.