SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model (2305.02034v4)
Abstract: The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS totally possesses 105,090 images and 1,668,241 instances, surpassing existing high-resolution RS segmentation datasets in size by several orders of magnitude. It provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. Moreover, preliminary experiments highlight the importance of conducting segmentation pre-training with SAMRS to address task discrepancies and alleviate the limitations posed by limited training data during fine-tuning. The code and dataset will be available at https://github.com/ViTAE-Transformer/SAMRS.
- BEiT: BERT pre-training of image transformers. In ICLR, 2022.
- Landcover. ai: Dataset for automatic mapping of buildings, woodlands, water and roads from aerial imagery. In CVPR, pages 1102–1110, 2021.
- Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. arXiv preprint arXiv:2306.16269, 2023.
- Vision transformer adapter for dense predictions. In ICLR, 2023.
- Masked-attention mask transformer for universal image segmentation. In CVPR, pages 1290–1299, 2022.
- Deepglobe 2018: A challenge to parse the earth through satellite images. In CVPRW, pages 172–181, 2018.
- Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255, 2009.
- Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023.
- Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing, 162:94–114, 2020.
- Object detection in aerial images: A large-scale benchmark and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2021.
- Lanet: Local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 59(1):426–435, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
- Masked autoencoders are scalable vision learners. In CVPR, pages 16000–16009, June 2022.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022.
- Knowledge distillation with segment anything (sam) model for planetary geological mapping. arXiv preprint arXiv:2305.07586, 2023.
- Segment anything. In ICCV, pages 4015–4026, October 2023.
- Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS journal of photogrammetry and remote sensing, 159:296–307, 2020.
- Microsoft coco: Common objects in context. In ECCV, pages 740–755, 2014.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
- Swin Transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021.
- A high resolution optical satellite image dataset for ship recognition and some new baselines. In ICPRAM, pages 324–331, 2017.
- On creating benchmark dataset for aerial image interpretation: Reviews, guidances, and million-aid. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:4205–4230, 2021.
- Uavid: A semantic segmentation dataset for uav imagery. ISPRS journal of photogrammetry and remote sensing, 165:108–119, 2020.
- Factseg: Foreground activation-driven small object semantic segmentation in large-scale remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 60:1–16, 2022.
- Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
- Land cover mapping at very high resolution with rotation equivariant cnns: Towards small yet accurate models. ISPRS journal of photogrammetry and remote sensing, 145:96–107, 2018.
- Hybrid multiple attention network for semantic segmentation in aerial images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–18, 2022.
- The segment anything model (sam) for remote sensing applications: From zero to one shot. arXiv preprint arXiv:2306.16623, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
- Segment anything, from space? arXiv preprint arXiv:2304.13000, 2023.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241, 2015.
- Fair1m: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 184:116–130, 2022.
- Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment, 237:111322, 2020.
- Semantic segmentation of urban scenes by learning local class interactions. In CVPRW, pages 1–9, 2015.
- An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing, 61:1–20, 2023.
- Advancing plain vision transformer toward remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing, 61:1–15, 2023.
- Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation. In NeurIPS Track on Datasets and Benchmarks, volume 1, 2021.
- A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022.
- Internimage: Exploring large-scale vision foundation models with deformable convolutions. In CVPR, pages 14408–14419, 2023.
- isaid: A large-scale dataset for instance segmentation in aerial images. In CVPRW, pages 28–37, 2019.
- Dota: A large-scale dataset for object detection in aerial images. In CVPR, June 2018.
- Unified perceptual parsing for scene understanding. In ECCV, pages 418–434, 2018.
- Rssformer: Foreground saliency enhancement for remote sensing land-cover segmentation. IEEE Transactions on Image Processing, 32:1052–1064, 2023.
- Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models. arXiv preprint arXiv:2304.10597, 2023.
- Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 131(5):1141–1162, 2023.
- Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023.
- Parsing very high resolution urban scene images by learning deep convnets with edge-aware loss. ISPRS Journal of Photogrammetry and Remote Sensing, 170:15–28, 2020.
- Di Wang (407 papers)
- Jing Zhang (731 papers)
- Bo Du (264 papers)
- Minqiang Xu (17 papers)
- Lin Liu (190 papers)
- Dacheng Tao (829 papers)
- Liangpei Zhang (113 papers)