Composed Image Retrieval for Remote Sensing (2405.15587v3)
Abstract: This work introduces composed image retrieval to remote sensing. It allows to query a large image archive by image examples alternated by a textual description, enriching the descriptive power over unimodal queries, either visual or textual. Various attributes can be modified by the textual part, such as shape, color, or context. A novel method fusing image-to-image and text-to-image similarity is introduced. We demonstrate that a vision-LLM possesses sufficient descriptive power and no further learning step or training data are necessary. We present a new evaluation benchmark focused on color, context, density, existence, quantity, and shape modifications. Our work not only sets the state-of-the-art for this task, but also serves as a foundational step in addressing a gap in the field of remote sensing image retrieval. Code at: https://github.com/billpsomas/rscir
- “An environment for content-based image retrieval from large spatial databases,” ISPRS Journal of Photogrammetry and Remote Sensing, 1999.
- “Remote sensing image retrieval in the past decade: Achievements, challenges, and future directions,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023.
- “Exploiting low dimensional features from the mobilenets for remote sensing image retrieval,” Earth Science Informatics, vol. 13, pp. 1437–1443, 2020.
- “A learnable joint spatial and spectral transformation for high resolution remote sensing image retrieval,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 8100–8112, 2021.
- “Plasticity-stability preserving multi-task learning for remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- “A novel multi-attention fusion network with dilated convolution and label smoothing for remote sensing image retrieval,” International Journal of Remote Sensing, vol. 43, no. 4, pp. 1306–1322, 2022.
- “Global-aware ranking deep metric learning for remote sensing image retrieval,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
- “A novel ensemble architecture of residual attention-based deep metric learning for remote sensing image retrieval,” Remote Sensing, vol. 13, no. 17, pp. 3445, 2021.
- “Graph relation network: Modeling relations between scenes for multilabel remote-sensing image classification and retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 5, pp. 4355–4369, 2020.
- “A novel graph-theoretic deep representation learning method for multi-label remote sensing image retrieval,” in 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 2021, pp. 266–269.
- “Informative and representative triplet selection for multilabel remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2021.
- “A semantic-preserving deep hashing model for multi-label remote sensing image retrieval,” Remote Sensing, vol. 13, no. 24, pp. 4965, 2021.
- “Toward multilabel image retrieval for remote sensing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2021.
- “Multilabel remote sensing image retrieval based on fully convolutional network,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 318–328, 2020.
- “Composing text and image for image retrieval-an empirical odyssey,” in CVPR, 2019.
- “Image search with text feedback by visiolinguistic attention learning,” in CVPR, 2020.
- “Composed query image retrieval using locally bounded features,” in CVPR, 2020.
- “Effective conditioned and composed image retrieval combining clip-based features,” in CVPR, 2022.
- “Cosmo: Content-style modulation for image retrieval with text feedback,” in CVPR, 2021.
- “Pic2word: Mapping pictures to words for zero-shot composed image retrieval,” in CVPR, 2023.
- YN Mamatha and AG Ananth, “Content based image retrieval of satellite imageries using soft query based color composite techniques,” International Journal of Computer Applications, vol. 7, no. 5, pp. 0975–8887, 2010.
- “An improved svm model for relevance feedback in remote sensing image retrieval,” International Journal of Digital Earth, vol. 7, no. 9, pp. 725–745, 2014.
- “Fuzzy content-based image retrieval for oceanic remote sensing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 9, pp. 5422–5431, 2013.
- “Integrated spectral and spatial information mining in remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 3, pp. 673–685, 2004.
- “Modeling and detection of geospatial objects using texture motifs,” IEEE Transactions on Geoscience and Remote Sensing, vol. 44, no. 12, pp. 3706–3715, 2006.
- “Remote sensing image retrieval by scene semantic matching,” IEEE Transactions on Geoscience and Remote Sensing, vol. 51, no. 5, pp. 2874–2886, 2012.
- “Remote sensing image retrieval with combined features of salient region,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 40, pp. 83–88, 2014.
- “Remote-sensing image retrieval by combining image visual and semantic features,” International journal of remote sensing, vol. 34, no. 12, pp. 4200–4223, 2013.
- “Multilabel remote sensing image retrieval using a semisupervised graph-theoretic method,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 2, pp. 1144–1158, 2017.
- “A novel system for content based retrieval of multi-label remote sensing images,” in 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2017, pp. 1744–1747.
- “Content-based high-resolution remote sensing image retrieval via unsupervised feature learning and collaborative affinity metric fusion,” Remote Sensing, vol. 8, no. 9, pp. 709, 2016.
- “Delving into deep representations for remote sensing image retrieval,” in 2016 IEEE 13th International Conference on Signal Processing (ICSP). IEEE, 2016, pp. 198–203.
- “Enhanced interactive remote sensing image retrieval with scene classification convolutional neural networks model,” in IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2018, pp. 4748–4751.
- “Remote sensing image retrieval using convolutional neural network features and weighted distance,” IEEE geoscience and remote sensing letters, vol. 15, no. 10, pp. 1535–1539, 2018.
- Paolo Napoletano, “Visual descriptors for content-based retrieval of remote-sensing images,” International journal of remote sensing, vol. 39, no. 5, pp. 1343–1376, 2018.
- “Exploiting representations from pre-trained convolutional neural networks for high-resolution remote sensing image retrieval,” Multimedia Tools and Applications, vol. 77, pp. 17489–17515, 2018.
- “Unsupervised deep feature learning for remote sensing image retrieval,” Remote Sensing, vol. 10, no. 8, pp. 1243, 2018.
- “Aggregated deep local features for remote sensing image retrieval,” Remote Sensing, vol. 11, no. 5, pp. 493, 2019.
- “Scalable database indexing and fast image retrieval based on deep learning and hierarchically nested structure applied to remote sensing and plant biology,” Journal of Imaging, vol. 5, no. 3, pp. 33, 2019.
- “Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval,” Remote Sensing, vol. 9, no. 5, pp. 489, 2017.
- “A triplet nonlocal neural network with dual-anchor triplet loss for high-resolution remote sensing image retrieval,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 2711–2723, 2021.
- “Remote sensing image retrieval with gabor-ca-resnet and split-based deep feature transform network,” Remote Sensing, vol. 13, no. 5, pp. 869, 2021.
- “Remote-sensing image retrieval with tree-triplet-classification networks,” Neurocomputing, vol. 405, pp. 48–61, 2020.
- “A three-layered graph-based learning approach for remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 10, pp. 6020–6034, 2016.
- “Siamese graph convolutional network for content based remote sensing image retrieval,” Computer vision and image understanding, 2019.
- “Attention boosted bilinear pooling for remote sensing image retrieval,” International Journal of Remote Sensing, vol. 41, no. 7, pp. 2704–2724, 2020.
- “A discriminative feature learning approach for remote sensing image retrieval,” Remote Sensing, vol. 11, no. 3, pp. 281, 2019.
- “Attention-driven graph convolution network for remote sensing image retrieval,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
- “Enhancing remote sensing image retrieval using a triplet deep metric learning network,” International Journal of Remote Sensing, vol. 41, no. 2, pp. 740–751, 2020.
- “Eagle-eyed multitask cnns for aerial image retrieval and scene classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 9, pp. 6699–6721, 2020.
- “Global optimization: Combining local loss with result ranking loss in remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 8, pp. 7011–7026, 2020.
- “Similarity-based unsupervised deep transfer learning for remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 11, pp. 7872–7889, 2020.
- “Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 11, pp. 6521–6536, 2018.
- “Cross-source image retrieval based on ensemble learning and knowledge distillation for remote sensing images,” in 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 2021, pp. 2803–2806.
- “A discriminative distillation network for cross-source remote sensing image retrieval,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 1234–1247, 2020.
- “Learning to translate for cross-source remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 7, pp. 4860–4874, 2020.
- “Mental retrieval of remote sensing images via adversarial sketch-image feature learning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 11, pp. 7801–7814, 2020.
- “Multisensor fusion and explicit semantic preserving-based deep hashing for cross-modal remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2021.
- “Bigearthnet-mm: A large-scale, multimodal, multilabel benchmark archive for remote sensing image classification and retrieval [software and data sets],” IEEE Geoscience and Remote Sensing Magazine, vol. 9, no. 3, pp. 174–180, 2021.
- “Fusion-based correlation learning model for cross-modal remote sensing image retrieval,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
- “Remote sensing cross-modal text-image retrieval based on global and local information,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- “Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval,” arXiv preprint arXiv:2204.09868, 2022.
- “Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7258–7267.
- “Geo-localization via ground-to-satellite cross-view image retrieval,” IEEE Transactions on Multimedia, 2022.
- “Cross-time and orientation-invariant overhead image geolocalization using deep local features,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2020, pp. 2512–2520.
- “Learning deep representations for ground-to-aerial geolocalization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5007–5015.
- “Cross-view image retrieval-ground to aerial image retrieval through deep learning,” in International Conference on Neural Information Processing. Springer, 2019, pp. 210–221.
- “Beyond cross-view image retrieval: Highly accurate vehicle localization using satellite image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17010–17020.
- “Fine-tuning cnn image retrieval with no human annotation,” PAMI, 2019.
- “Deep image retrieval: Learning global representations for image search,” in ECCV, 2016.
- “Large-scale image retrieval with attentive deep local features,” in ICCV, 2017.
- “Adversarial representation learning for text-to-image matching,” in ICCV, 2019, pp. 5814–5824.
- “Context-aware attention network for image-text retrieval,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3536–3545.
- “Devise: A deep visual-semantic embedding model,” NeurIPS, vol. 26, 2013.
- “Learning joint visual semantic matching embeddings for language-guided retrieval,” in ECCV, 2020.
- “Disentangled non-local neural networks,” in ECCV, 2020.
- “Faster r-cnn: Towards real-time object detection with region proposal networks,” NeurIPS, 2015.
- “Artemis: Attention-based retrieval with text-explicit matching and implicit similarity,” 2022.
- “Automatic spatially-aware fashion concept discovery,” in ICCV, 2017.
- “Automatic attribute discovery and characterization from noisy web data,” in ECCV. Springer, 2010.
- “Fashion iq: A new dataset towards retrieving images by natural language feedback,” in CVPR, 2021.
- “Discovering states and transformations in image collections,” in CVPR, 2015, pp. 1383–1391.
- “Microsoft coco: Common objects in context,” in ECCV, 2014.
- “Probabilistic compositional embeddings for multimodal image retrieval,” in CVPR, 2022.
- “Learning transferable visual models from natural language supervision,” in ICML. PMLR, 2021.
- “Scaling up visual and vision-language representation learning with noisy text supervision,” in ICML, 2021.
- “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, 2022.
- “Zero-shot composed image retrieval with textual inversion,” in ICCV, 2023.
- “Vision-by-language for training-free compositional image retrieval,” 2023, arXiv.
- “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” arXiv preprint arXiv:2301.12597, 2023.
- “Laion-5b: An open large-scale dataset for training next generation image-text models,” NeurIPS, 2022.
- “Open-vocabulary object detection via vision and language knowledge distillation,” arXiv preprint arXiv:2104.13921, 2021.
- “Open-vocabulary semantic segmentation with mask-adapted clip,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7061–7070.
- “Zerocap: Zero-shot image-to-text generation for visual-semantic arithmetic,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17918–17928.
- “Medclip: Contrastive learning from unpaired medical images and text,” arXiv preprint arXiv:2210.10163, 2022.
- “Remoteclip: A vision language foundation model for remote sensing,” arXiv preprint arXiv:2306.11029, 2023.
- “Satclip: Global, general-purpose location embeddings with satellite imagery,” arXiv preprint arXiv:2311.17179, 2023.
- “Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval,” ISPRS journal of photogrammetry and remote sensing, vol. 145, pp. 197–209, 2018.