Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 60 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 227 tok/s Pro
2000 character limit reached

Bridging Remote Sensors with Multisensor Geospatial Foundation Models (2404.01260v1)

Published 1 Apr 2024 in cs.CV, cs.AI, and cs.LG

Abstract: In the realm of geospatial analysis, the diversity of remote sensors, encompassing both optical and microwave technologies, offers a wealth of distinct observational capabilities. Recognizing this, we present msGFM, a multisensor geospatial foundation model that effectively unifies data from four key sensor modalities. This integration spans an expansive dataset of two million multisensor images. msGFM is uniquely adept at handling both paired and unpaired sensor data. For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach in masked image modeling, enabling the synthesis of joint representations from diverse sensors. msGFM, incorporating four remote sensors, upholds strong performance, forming a comprehensive model adaptable to various sensor types. msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks. These include scene classification, segmentation, cloud removal, and pan-sharpening. A key discovery of our research is that representations derived from natural images are not always compatible with the distinct characteristics of geospatial remote sensors, underscoring the limitations of existing representations in this field. Our work can serve as a guide for developing multisensor geospatial pretraining models, paving the way for more advanced geospatial capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Better fine-tuning by reducing representational collapse, 2020.
  2. Muppet: Massive multi-task representations with pre-finetuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5799–5811, Online and Punta Cana, Dominican Republic, 2021. Association for Computational Linguistics.
  3. Heterogeneous multi-task learning with expert diversity. IEEE/ACM Transactions on Computational Biology and Bioinformatics, pages 1–1, 2022.
  4. Geography-aware self-supervised learning. CoRR, abs/2011.09980, 2020.
  5. Multimae: Multi-modal multi-task masked autoencoders, 2022.
  6. The normalised sentinel-1 global backscatter model, mapping earth’s land surface with c-band microwaves. Scientific Data, 8, 2021.
  7. Contrastive multiview coding with electro-optics for SAR semantic segmentation. CoRR, abs/2109.00120, 2021.
  8. Improved baselines with momentum contrastive learning. CoRR, abs/2003.04297, 2020.
  9. Self-supervised sar-optical data fusion of sentinel-1/-2 images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2022.
  10. Lightweight in-context tuning for multimodal unified models, 2023.
  11. Camml: Context-aware multimodal learner for large models, 2024.
  12. Mod-squad: Designing mixture of experts as modular multi-task learners, 2022.
  13. XLM-E: Cross-lingual language model pre-training via ELECTRA. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6170–6182, Dublin, Ireland, 2022. Association for Computational Linguistics.
  14. Functional map of the world, 2018.
  15. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. arXiv preprint arXiv:2207.08051, 2022.
  16. Unsupervised cross-lingual representation learning at scale, 2020.
  17. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  18. Sentinel-2: Esa’s optical high-resolution mission for gmes operational services. Remote Sensing of Environment, 120:25–36, 2012. The Sentinel Missions - New Opportunities for Science.
  19. Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery. IEEE Transactions on Geoscience and Remote Sensing, 2020.
  20. Eva: Exploring the limits of masked visual representation learning at scale, 2022.
  21. Chapter 20 - sar interferometry and tomography: Theory and applications. In Academic Press Library in Signal Processing: Volume 2, pages 1043–1117. Elsevier, 2014.
  22. Prediff: Precipitation nowcasting with latent diffusion models. In Advances in Neural Information Processing Systems, pages 78621–78656. Curran Associates, Inc., 2023.
  23. Aboveground biomass estimation using multi-sensor data synergy and machine learning algorithms in a dense tropical forest. Applied Geography, 96:29–40, 2018.
  24. On the generalization of agricultural drought classification from climate data. In NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning, 2021.
  25. A conditional generative adversarial network to fuse sar and multispectral optical data for cloud removal from sentinel-2 images. In IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pages 1726–1729, 2018.
  26. A robust end-to-end method for parametric curve tracing via soft cosine-similarity-based objective function. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 2453–2463, 2021.
  27. Masked autoencoders are scalable vision learners. CoRR, abs/2111.06377, 2021a.
  28. Multisource Remote Sensing Image Fusion, chapter 10, pages 136–149. John Wiley Sons, Ltd, 2021b.
  29. Mdas: a new multimodal benchmark dataset for remote sensing. Earth System Science Data, 15(1):113–131, 2023.
  30. Tutel: Adaptive mixture-of-experts at scale. CoRR, abs/2206.03382, 2022.
  31. Compensatory water effects link yearly global land co2 sink changes to temperature. Nature, 541, 2017.
  32. Cross-lingual language model pretraining, 2019.
  33. M33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTvit: Mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design, 2022.
  34. Multi-source remote sensing pretraining based on contrastive self-supervised learning. Remote Sensing, 14(18), 2022a.
  35. Swin transformer: Hierarchical vision transformer using shifted windows. CoRR, abs/2103.14030, 2021.
  36. Swin transformer v2: Scaling up capacity and resolution. In International Conference on Computer Vision and Pattern Recognition (CVPR), 2022b.
  37. On creating benchmark dataset for aerial image interpretation: Reviews, guidances and million-aid. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:4205–4230, 2021.
  38. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. CoRR, abs/2103.16607, 2021.
  39. Gfm: Building geospatial foundation models via continual pretraining, 2023.
  40. Cloud removal in sentinel-2 imagery using a deep residual neural network and sar-optical data fusion. ISPRS Journal of Photogrammetry and Remote Sensing, 166:333–346, 2020.
  41. A review of applications of satellite sar, optical, altimetry and dem data for surface water modelling, mapping and parameter estimation. Hydrology and Earth System Sciences, 19(9):3755–3769, 2015.
  42. In-domain representation learning for remote sensing. CoRR, abs/1911.06721, 2019.
  43. Quality not quantity: On the interaction between dataset design and robustness of clip, 2023.
  44. Characterizing landscape spatial heterogeneity in multisensor images with variogram models. Chinese Geographical Science, 24:317–327, 2013.
  45. Learning transferable visual models from natural language supervision, 2021.
  46. Deep learning and process understanding for data-driven earth system science. Nature, 566:195, 2019.
  47. Scaling vision with sparse mixture of experts, 2021.
  48. The role of pre-training in high-resolution remote sensing scene classification. CoRR, abs/2111.03690, 2021.
  49. The isprs benchmark on urban object classification and 3d building reconstruction. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences I-3 (2012), Nr. 1, 1(1):293–298, 2012.
  50. Self-supervised vision transformers for land-cover segmentation and classification. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1421–1430, 2022.
  51. Fusion of sar and optical remote sensing data — challenges and recent trends. 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 5458–5461, 2017.
  52. Sen12ms – a curated dataset of georeferenced multi-spectral sentinel-1/2 imagery for deep learning and data fusion. In ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pages 153–160, 2019.
  53. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer, 2017.
  54. Scaling vision-language models with sparse mixture of experts, 2023.
  55. A comprehensive evaluation of filters for radar speckle suppression. IGARSS ’96. 1996 International Geoscience and Remote Sensing Symposium, 3:1559–1561 vol.3, 1996.
  56. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, pages 5901–5904. IEEE, 2019.
  57. Ringmo: A remote sensing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing, pages 1–1, 2022.
  58. High-frequency time series comparison of sentinel-1 and sentinel-2 satellites for mapping open and vegetated water across the united states (2017–2021). Remote Sensing of Environment, 288:113498, 2023.
  59. The color out of space: learning self-supervised representations for earth observation imagery. CoRR, abs/2006.12119, 2020.
  60. An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing, 2022a.
  61. Advancing plain vision transformer towards remote sensing foundation model, 2022b.
  62. Dino-mc: Self-supervised contrastive learning for remote sensing imagery with multi-sized local crops, 2023.
  63. Simmim: A simple framework for masked image modeling. CoRR, abs/2111.09886, 2021.
  64. A comprehensive comparison of machine learning and feature selection methods for maize biomass estimation using sentinel-1 sar, sentinel-2 vegetation indices, and biophysical variables. Remote Sensing, 14(16), 2022a.
  65. Glf-cr: Sar-enhanced cloud removal with global–local fusion. ISPRS Journal of Photogrammetry and Remote Sensing, 192:268–278, 2022b.
  66. Uni-perceiver-moe: Learning sparse generalist models with conditional moes. arXiv preprint arXiv:2206.04674, 2022.
Citations (7)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.