Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery (2403.05419v1)

Published 8 Mar 2024 in cs.CV

Abstract: Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks by pre-training on large amount of unlabelled data. Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data. Different from standard natural image datasets, remote sensing data is acquired from various sensor technologies and exhibit diverse range of scale variations as well as modalities. Existing satellite image pre-training methods either ignore the scale information present in the remote sensing imagery or restrict themselves to use only a single type of data modality. In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities. Our proposed approach, named SatMAE++, performs multi-scale pre-training and utilizes convolution based upsampling blocks to reconstruct the image at higher scales making it extensible to include more scales. Compared to existing works, the proposed SatMAE++ with multi-scale pre-training is equally effective for both optical as well as multi-spectral imagery. Extensive experiments on six datasets reveal the merits of proposed contributions, leading to state-of-the-art performance on all datasets. SatMAE++ achieves mean average precision (mAP) gain of 2.5\% for multi-label classification task on BigEarthNet dataset. Our code and pre-trained models are available at \url{https://github.com/techmn/satmae_pp}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Transformers in remote sensing: A survey. Remote Sensing, 15(7), 2023.
  2. Geography-aware self-supervised learning, 2022.
  3. Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE, 105(10):1865–1883, 2017.
  4. Functional map of the world. In CVPR, 2018.
  5. Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery. In NeurIPS, 2022.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  7. Multiscale vision transformers. In ICCV, 2021.
  8. Convmae: Masked convolution meets masked autoencoders, 2022.
  9. Deep residual learning for image recognition. In CVPR, 2016.
  10. Mask r-cnn. In ICCV, 2017.
  11. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  12. Masked autoencoders are scalable vision learners. In CVPR, 2022.
  13. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.
  14. Feature pyramid networks for object detection. In CVPR, 2017.
  15. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  16. Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. In ICCV, 2021.
  17. Towards geospatial foundation models via continual pretraining. In ICCV, 2023.
  18. In-domain representation learning for remote sensing. In ICLR, 2020.
  19. Remote sensing change detection with transformers trained from scratch, 2023.
  20. Elgc-net: Efficient local–global context aggregation for remote sensing change detection. IEEE Transactions on Geoscience and Remote Sensing, 62:1–11, 2024.
  21. Scale-mae: A scale-aware masked autoencoder for multiscale geospatial representation learning. In ICCV, 2023.
  22. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. In IGARSS, 2019.
  23. Focal modulation networks. In NeurIPS, 2022.
  24. Bag-of-visual-words and spatial extensions for land-use classification. In ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems, 2010.
  25. Point-m2ae: Multi-scale masked autoencoders for hierarchical point cloud pre-training. arXiv preprint arXiv:2205.14401, 2022.
Citations (17)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com