USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery (2312.02199v1)
Abstract: Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor, multi-spectral, and temporal information providing massive amounts of self-labeled data that can be used for self-supervised pre-training. In this work, we develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training. USat is a vision transformer with modified patch projection layers and positional encodings to model spectral bands with varying spatial scales from multiple sensors. We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art self-supervised MAE models trained on remote sensing data on multiple remote sensing benchmark datasets (up to 8%) and leads to improvements in low data regimes (up to 7%). Code and pre-trained weights are available at https://github.com/stanfordmlgroup/USat .
- Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10181–10190, 2021.
- Satlas: A large-scale, multi-task dataset for remote sensing image understanding. arXiv preprint arXiv:2211.15660, 2022.
- Machine learning in weather prediction and climate analyses—applications and perspectives. Atmosphere, 13(2):180, 2022.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Using satellite imagery to understand and promote sustainable development. Science, 371(6535):eabe8628, 2021.
- Self-supervised encoders are better transfer learners in remote sensing applications. Remote Sensing, 14(21):5500, 2022.
- Self-supervised sar-optical data fusion of sentinel-1/-2 images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2021.
- Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307, 2023.
- Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. Advances in Neural Information Processing Systems, 35:197–211, 2022.
- Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters. arXiv preprint arXiv:2305.13456, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
- Masked auto-encoding spectral–spatial transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022.
- Multi-modal self-supervised representation learning for earth observation. In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pages 3241–3244. IEEE, 2021.
- Multimodal contrastive learning for remote sensing tasks. arXiv preprint arXiv:2209.02329, 2022.
- Tile2vec: Unsupervised representation learning for spatially distributed data. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3967–3974, 2019.
- Contrastive self-supervised learning with smoothed representation for remote sensing. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2021a.
- The potential of remote sensing and artificial intelligence as tools to improve the resilience of agriculture production systems. Current Opinion in Biotechnology, 70:15–22, 2021b.
- Deep unsupervised embedding for remotely sensed images based on spatially augmented momentum contrast. IEEE Transactions on Geoscience and Remote Sensing, 59(3):2598–2610, 2020.
- Facilitating adoption of ai in natural disaster management through collaboration. Nature communications, 13(1):1579, 2022.
- Toward foundation models for earth monitoring: Proposal for a climate change benchmark. arXiv preprint arXiv:2112.00570, 2021.
- Global and local contrastive self-supervised learning for semantic segmentation of hr remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Deep learning in remote sensing applications: A meta-analysis and review. ISPRS journal of photogrammetry and remote sensing, 152:166–177, 2019.
- On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798, 2023a.
- Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. arXiv preprint arXiv:2305.01118, 2023b.
- Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9414–9423, 2021.
- Gfm: Building geospatial foundation models via continual pretraining. arXiv preprint arXiv:2302.04476, 2023.
- Considerations for ai-eo for agriculture in sub-saharan africa. Environmental Research Letters, 18(4):041002, 2023.
- Climax: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343, 2023.
- Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. arXiv preprint arXiv:2212.14532, 2022.
- Deep learning and process understanding for data-driven earth system science. Nature, 566(7743):195–204, 2019.
- Automated extraction of energy systems information from remotely sensed data: A review and analysis. Applied Energy, 326:119876, 2022.
- Masked vision transformers for hyperspectral image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2165–2175, 2023.
- Torchgeo: deep learning with geospatial data. In Proceedings of the 30th international conference on advances in geographic information systems, pages 1–12, 2022.
- Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, pages 5901–5904. IEEE, 2019.
- Representation learning for remote sensing: An unsupervised sensor fusion approach. arXiv preprint arXiv:2108.05094, 2021.
- Remote sensing image scene classification with self-supervised paradigm under limited labeled samples. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2020.
- Lightweight, pre-trained transformers for remote sensing timeseries. arXiv preprint arXiv:2304.14065, 2023.
- Self-supervised learning in remote sensing: A review. arXiv preprint arXiv:2206.13188, 2022.
- What should not be contrastive in contrastive learning. arXiv preprint arXiv:2008.05659, 2020.
- Meter-ml: A multi-sensor earth observation benchmark for automated methane source mapping. arXiv preprint arXiv:2207.11166, 2022.
- Deep learning in remote sensing: A comprehensive review and list of resources. IEEE geoscience and remote sensing magazine, 5(4):8–36, 2017.