Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery (2312.02199v1)

Published 2 Dec 2023 in cs.CV, cs.AI, cs.LG, eess.IV, and stat.AP

Abstract: Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor, multi-spectral, and temporal information providing massive amounts of self-labeled data that can be used for self-supervised pre-training. In this work, we develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training. USat is a vision transformer with modified patch projection layers and positional encodings to model spectral bands with varying spatial scales from multiple sensors. We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art self-supervised MAE models trained on remote sensing data on multiple remote sensing benchmark datasets (up to 8%) and leads to improvements in low data regimes (up to 7%). Code and pre-trained weights are available at https://github.com/stanfordmlgroup/USat .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10181–10190, 2021.
  2. Satlas: A large-scale, multi-task dataset for remote sensing image understanding. arXiv preprint arXiv:2211.15660, 2022.
  3. Machine learning in weather prediction and climate analyses—applications and perspectives. Atmosphere, 13(2):180, 2022.
  4. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  5. Using satellite imagery to understand and promote sustainable development. Science, 371(6535):eabe8628, 2021.
  6. Self-supervised encoders are better transfer learners in remote sensing applications. Remote Sensing, 14(21):5500, 2022.
  7. Self-supervised sar-optical data fusion of sentinel-1/-2 images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11, 2021.
  8. Longlora: Efficient fine-tuning of long-context large language models. arXiv preprint arXiv:2309.12307, 2023.
  9. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. Advances in Neural Information Processing Systems, 35:197–211, 2022.
  10. Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters. arXiv preprint arXiv:2305.13456, 2023.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  12. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  13. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  14. Masked auto-encoding spectral–spatial transformer for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022.
  15. Multi-modal self-supervised representation learning for earth observation. In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pages 3241–3244. IEEE, 2021.
  16. Multimodal contrastive learning for remote sensing tasks. arXiv preprint arXiv:2209.02329, 2022.
  17. Tile2vec: Unsupervised representation learning for spatially distributed data. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3967–3974, 2019.
  18. Contrastive self-supervised learning with smoothed representation for remote sensing. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2021a.
  19. The potential of remote sensing and artificial intelligence as tools to improve the resilience of agriculture production systems. Current Opinion in Biotechnology, 70:15–22, 2021b.
  20. Deep unsupervised embedding for remotely sensed images based on spatially augmented momentum contrast. IEEE Transactions on Geoscience and Remote Sensing, 59(3):2598–2610, 2020.
  21. Facilitating adoption of ai in natural disaster management through collaboration. Nature communications, 13(1):1579, 2022.
  22. Toward foundation models for earth monitoring: Proposal for a climate change benchmark. arXiv preprint arXiv:2112.00570, 2021.
  23. Global and local contrastive self-supervised learning for semantic segmentation of hr remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022.
  24. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  25. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  26. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS journal of photogrammetry and remote sensing, 152:166–177, 2019.
  27. On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798, 2023a.
  28. Csp: Self-supervised contrastive spatial pre-training for geospatial-visual representations. arXiv preprint arXiv:2305.01118, 2023b.
  29. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9414–9423, 2021.
  30. Gfm: Building geospatial foundation models via continual pretraining. arXiv preprint arXiv:2302.04476, 2023.
  31. Considerations for ai-eo for agriculture in sub-saharan africa. Environmental Research Letters, 18(4):041002, 2023.
  32. Climax: A foundation model for weather and climate. arXiv preprint arXiv:2301.10343, 2023.
  33. Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. arXiv preprint arXiv:2212.14532, 2022.
  34. Deep learning and process understanding for data-driven earth system science. Nature, 566(7743):195–204, 2019.
  35. Automated extraction of energy systems information from remotely sensed data: A review and analysis. Applied Energy, 326:119876, 2022.
  36. Masked vision transformers for hyperspectral image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2165–2175, 2023.
  37. Torchgeo: deep learning with geospatial data. In Proceedings of the 30th international conference on advances in geographic information systems, pages 1–12, 2022.
  38. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, pages 5901–5904. IEEE, 2019.
  39. Representation learning for remote sensing: An unsupervised sensor fusion approach. arXiv preprint arXiv:2108.05094, 2021.
  40. Remote sensing image scene classification with self-supervised paradigm under limited labeled samples. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2020.
  41. Lightweight, pre-trained transformers for remote sensing timeseries. arXiv preprint arXiv:2304.14065, 2023.
  42. Self-supervised learning in remote sensing: A review. arXiv preprint arXiv:2206.13188, 2022.
  43. What should not be contrastive in contrastive learning. arXiv preprint arXiv:2008.05659, 2020.
  44. Meter-ml: A multi-sensor earth observation benchmark for automated methane source mapping. arXiv preprint arXiv:2207.11166, 2022.
  45. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE geoscience and remote sensing magazine, 5(4):8–36, 2017.
Citations (12)

Summary

  • The paper introduces USat, a self-supervised vision transformer designed for multi-sensor satellite imagery that eliminates the need for extensive labeled data.
  • It adapts modified patch projection layers and positional encodings to integrate diverse spectral bands, achieving up to 8% improvement on benchmarks.
  • USat's flexible architecture supports arbitrary spectral band combinations, reducing computational loads and enhancing performance in low-data regimes.

In the field of satellite imaging, leveraging the vast amounts of data collected from Earth observation satellites is paramount for a multitude of applications ranging from agriculture and energy to disaster response and climate monitoring. A recent development in this field is the creation of a new encoder architecture known as USat, designed for multi-sensor satellite imagery. This architecture is particularly innovative because it is trained in a self-supervised manner, meaning it learns to interpret the data without the need for manually labeled datasets, which are often expensive and time-consuming to produce.

USat, developed by researchers at Stanford University, is a vision transformer adapted to accommodate multi-spectral data from multiple sensors. It achieves this by introducing modified patch projection layers and positional encodings, which allow it to process spectral bands with various spatial scales.

In more concrete terms, USat is integrated into a process named Masked Autoencoder (MAE) self-supervised pre-training procedure. This method trains the encoder to predict parts of the input image that are masked (hidden) based on the visible parts, thus learning a robust representation of the data. The primary benefit of USat compared to previous models is its ability to use an arbitrary collection of images with different sets of spectral bands and ground sampling distances, thus reducing computational loads and optimizing outcomes.

Experiments show that USat outperforms single sensor pre-training approaches in both image interpretation accuracy and efficiency in scenarios where labeled data is scarce. It demonstrates that leveraging data from multiple sensors can significantly boost the model's learning capacity compared to using just a single type of sensor data.

An especially compelling aspect of USat is its contribution to a self-supervised MAE model, which has been tested on various benchmark datasets. For instance, improvements of up to 8% on multiple remote sensing benchmark datasets and up to 7% in low data regimes were observed - an encouraging leap forward for the field.

Moreover, the USat architecture can support a flexible selection of spectral bands for fine-tuning the model, thereby increasing adaptability for varied practical applications. The researchers have also made sure that key resources, such as the code and pre-trained weights, are readily available to the public, encouraging further development and application of their work.

In summary, USat represents a step forward in the efficient and effective interpretation of multi-sensor satellite imagery. With its self-supervised learning methodology, it holds the promise of advancing geographical analysis and various crucial applications of satellite data, marking a significant stride toward autonomous and agile satellite image processing.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets