Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GEOBIND: Binding Text, Image, and Audio through Satellite Images (2404.11720v1)

Published 17 Apr 2024 in cs.AI

Abstract: In remote sensing, we are interested in modeling various modalities for some geographic location. Several works have focused on learning the relationship between a location and type of landscape, habitability, audio, textual descriptions, etc. Recently, a common way to approach these problems is to train a deep-learning model that uses satellite images to infer some unique characteristics of the location. In this work, we present a deep-learning model, GeoBind, that can infer about multiple modalities, specifically text, image, and audio, from satellite imagery of a location. To do this, we use satellite images as the binding element and contrastively align all other modalities to the satellite image data. Our training results in a joint embedding space with multiple types of data: satellite image, ground-level image, audio, and text. Furthermore, our approach does not require a single complex dataset that contains all the modalities mentioned above. Rather it only requires multiple satellite-image paired data. While we only align three modalities in this paper, we present a general framework that can be used to create an embedding space with any number of modalities by using satellite images as the binding element. Our results show that, unlike traditional unimodal models, GeoBind is versatile and can reason about multiple modalities for a given satellite image input.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. “Towards the evaluation of rural livability in china: Theoretical framework and empirical case study,” Habitat International, vol. 105, pp. 102241, 2020.
  2. “Land-use/land-cover change detection based on a siamese global learning framework for high spatial resolution remote sensing imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 184, pp. 63–78, 2022.
  3. “Learning tri-modal embeddings for zero-shot soundscape mapping,” arXiv preprint arXiv:2309.10667, 2023.
  4. “Investigating changes in noise pollution due to the covid-19 lockdown: The case of dublin, ireland,” Sustainable Cities and Society, vol. 65, pp. 102597, 2021.
  5. “Sat2cap: Mapping fine-grained textual descriptions from satellite images,” arXiv preprint arXiv:2307.15904, 2023.
  6. “What goes where: Predicting object distributions from above,” in IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2018, pp. 4375–4378.
  7. “Land-use mapping for high-spatial resolution remote sensing image via deep learning: A review,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 5372–5391, 2021.
  8. “Interpretable scenicness from sentinel-2 imagery,” in IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium, 2020, pp. 3983–3986.
  9. “Birdsat: Cross-view contrastive masked autoencoders for bird species classification and mapping,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 7136–7145.
  10. “Satclip: Global, general-purpose location embeddings with satellite imagery,” arXiv preprint arXiv:2311.17179, 2023.
  11. “Imagebind: One embedding space to bind them all,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15180–15190.
  12. “Learning transferable visual models from natural language supervision,” in International conference on machine learning. PMLR, 2021, pp. 8748–8763.
  13. “Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  14. “Randaugment: Practical automated data augmentation with a reduced search space,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 702–703.
  15. “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  16. “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
  17. “Self-supervised audiovisual representation learning for remote sensing data,” International Journal of Applied Earth Observation and Geoinformation, vol. 116, pp. 103130, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Aayush Dhakal (13 papers)
  2. Subash Khanal (13 papers)
  3. Srikumar Sastry (13 papers)
  4. Adeel Ahmad (11 papers)
  5. Nathan Jacobs (70 papers)
Citations (2)