Fewshot learning on global multimodal embeddings for earth observation tasks (2310.00119v2)
Abstract: In this work we pretrain a CLIP/ViT based model using three different modalities of satellite imagery across five AOIs covering over ~10\% of Earth's total landmass, namely Sentinel 2 RGB optical imagery, Sentinel 1 SAR radar amplitude and interferometric coherence. This model uses $\sim 250$ M parameters. Then, we use the embeddings produced for each modality with a classical machine learning method to attempt different downstream tasks for earth observation related to vegetation, built up surface, croplands and permanent water. We consistently show how we reduce the need for labeled data by 99\%, so that with ~200-500 randomly selected labeled examples (around 4K-10K km$2$) we reach performance levels analogous to those achieved with the full labeled datasets (about 150K image chips or 3M km$2$ in each area of interest - AOI) on all modalities, AOIs and downstream tasks. This leads us to think that the model has captured significant earth features useful in a wide variety of scenarios. To enhance our model's usability in practice, its architecture allows inference in contexts with missing modalities and even missing channels within each modality. Additionally, we visually show that this embedding space, obtained with no labels, is sensible to the different earth features represented by the labelled datasets we selected.
- Deep learning in remote sensing: A comprehensive review and list of resources. IEEE geoscience and remote sensing magazine, 5(4):8–36, 2017.
- European Space Agency. Copernicus Sentinel Data Access Annuel Report 2021. ESA, 2022. URL https://sentinels.copernicus.eu/web/sentinel/-/copernicus-sentinel-data-access-annual-report-2021.
- Self-supervised learning in remote sensing: A review. arXiv preprint arXiv:2206.13188, 2022a.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
- Clip-vip: Adapting pre-trained image-text model to video-language representation alignment. arXiv preprint arXiv:2209.06430, 2022.
- Medclip: Contrastive learning from unpaired medical images and text. arXiv preprint arXiv:2210.10163, 2022b.
- Align before fuse: Vision and language representation learning with momentum distillation. Advances in neural information processing systems, 34:9694–9705, 2021.
- Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. Advances in Neural Information Processing Systems, 35:197–211, 2022.
- Lightweight, pre-trained transformers for remote sensing timeseries. arXiv preprint arXiv:2304.14065, 2023.
- Ssl4eo-l: Datasets and foundation models for landsat imagery. arXiv preprint arXiv:2306.09424, 2023.
- Weakly supervised object detection for remote sensing images: A survey. Remote Sensing, 14(21):5362, 2022.
- Toward foundation models for earth monitoring: Proposal for a climate change benchmark. arXiv preprint arXiv:2112.00570, 2021.
- Optical remote sensing image understanding with weak supervision: Concepts, methods, and perspectives. IEEE Geoscience and Remote Sensing Magazine, 10(2):250–269, 2022.
- Global land-cover mapping with weak supervision: Outcome of the 2020 ieee grss data fusion contest. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:3185–3199, 2021.
- Unlocking large-scale crop field delineation in smallholder farming systems with transfer learning and weak supervision. Remote Sensing, 14(22):5738, 2022c.
- On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798, 2023.
- Foundation models for generalist geospatial artificial intelligence. arXiv preprint arXiv:2310.18660, 2023.
- Ringmo: A remote sensing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing, 2022.
- Advancing plain vision transformer toward remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing, 61:1–15, 2022d.
- Toward sustained monitoring of subsidence at the coast using insar and gps: An application in hampton roads, virginia. Geophysical Research Letters, 47(18):e2020GL090013, 2020.
- Esa worldcover 10 m 2020 v100, October 2021. URL https://doi.org/10.5281/zenodo.5571936.