Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model (2305.02034v4)

Published 3 May 2023 in cs.CV

Abstract: The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS totally possesses 105,090 images and 1,668,241 instances, surpassing existing high-resolution RS segmentation datasets in size by several orders of magnitude. It provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. Moreover, preliminary experiments highlight the importance of conducting segmentation pre-training with SAMRS to address task discrepancies and alleviate the limitations posed by limited training data during fine-tuning. The code and dataset will be available at https://github.com/ViTAE-Transformer/SAMRS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. BEiT: BERT pre-training of image transformers. In ICLR, 2022.
  2. Landcover. ai: Dataset for automatic mapping of buildings, woodlands, water and roads from aerial imagery. In CVPR, pages 1102–1110, 2021.
  3. Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. arXiv preprint arXiv:2306.16269, 2023.
  4. Vision transformer adapter for dense predictions. In ICLR, 2023.
  5. Masked-attention mask transformer for universal image segmentation. In CVPR, pages 1290–1299, 2022.
  6. Deepglobe 2018: A challenge to parse the earth through satellite images. In CVPRW, pages 172–181, 2018.
  7. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255, 2009.
  8. Segment anything model (sam) for digital pathology: Assess zero-shot segmentation on whole slide imaging. arXiv preprint arXiv:2304.04155, 2023.
  9. Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data. ISPRS Journal of Photogrammetry and Remote Sensing, 162:94–114, 2020.
  10. Object detection in aerial images: A large-scale benchmark and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2021.
  11. Lanet: Local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 59(1):426–435, 2020.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
  13. Masked autoencoders are scalable vision learners. In CVPR, pages 16000–16009, June 2022.
  14. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  15. Swin transformer embedding unet for remote sensing image semantic segmentation. IEEE Transactions on Geoscience and Remote Sensing, 60:1–15, 2022.
  16. Knowledge distillation with segment anything (sam) model for planetary geological mapping. arXiv preprint arXiv:2305.07586, 2023.
  17. Segment anything. In ICCV, pages 4015–4026, October 2023.
  18. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS journal of photogrammetry and remote sensing, 159:296–307, 2020.
  19. Microsoft coco: Common objects in context. In ECCV, pages 740–755, 2014.
  20. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
  21. Swin Transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021.
  22. A high resolution optical satellite image dataset for ship recognition and some new baselines. In ICPRAM, pages 324–331, 2017.
  23. On creating benchmark dataset for aerial image interpretation: Reviews, guidances, and million-aid. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:4205–4230, 2021.
  24. Uavid: A semantic segmentation dataset for uav imagery. ISPRS journal of photogrammetry and remote sensing, 165:108–119, 2020.
  25. Factseg: Foreground activation-driven small object semantic segmentation in large-scale remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 60:1–16, 2022.
  26. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  27. Land cover mapping at very high resolution with rotation equivariant cnns: Towards small yet accurate models. ISPRS journal of photogrammetry and remote sensing, 145:96–107, 2018.
  28. Hybrid multiple attention network for semantic segmentation in aerial images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–18, 2022.
  29. The segment anything model (sam) for remote sensing applications: From zero to one shot. arXiv preprint arXiv:2306.16623, 2023.
  30. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
  31. Segment anything, from space? arXiv preprint arXiv:2304.13000, 2023.
  32. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241, 2015.
  33. Fair1m: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 184:116–130, 2022.
  34. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment, 237:111322, 2020.
  35. Semantic segmentation of urban scenes by learning local class interactions. In CVPRW, pages 1–9, 2015.
  36. An empirical study of remote sensing pretraining. IEEE Transactions on Geoscience and Remote Sensing, 61:1–20, 2023.
  37. Advancing plain vision transformer toward remote sensing foundation model. IEEE Transactions on Geoscience and Remote Sensing, 61:1–15, 2023.
  38. Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation. In NeurIPS Track on Datasets and Benchmarks, volume 1, 2021.
  39. A novel transformer based semantic segmentation scheme for fine-resolution remote sensing images. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022.
  40. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In CVPR, pages 14408–14419, 2023.
  41. isaid: A large-scale dataset for instance segmentation in aerial images. In CVPRW, pages 28–37, 2019.
  42. Dota: A large-scale dataset for object detection in aerial images. In CVPR, June 2018.
  43. Unified perceptual parsing for scene understanding. In ECCV, pages 418–434, 2018.
  44. Rssformer: Foreground saliency enhancement for remote sensing land-cover segmentation. IEEE Transactions on Image Processing, 32:1052–1064, 2023.
  45. Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models. arXiv preprint arXiv:2304.10597, 2023.
  46. Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. International Journal of Computer Vision, 131(5):1141–1162, 2023.
  47. Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023.
  48. Parsing very high resolution urban scene images by learning deep convnets with edge-aware loss. ISPRS Journal of Photogrammetry and Remote Sensing, 170:15–28, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Di Wang (407 papers)
  2. Jing Zhang (731 papers)
  3. Bo Du (264 papers)
  4. Minqiang Xu (17 papers)
  5. Lin Liu (190 papers)
  6. Dacheng Tao (829 papers)
  7. Liangpei Zhang (113 papers)
Citations (80)

Summary

  • The paper introduces a scalable method using SAM to efficiently annotate remote sensing images, significantly reducing manual labeling costs.
  • The SAMRS dataset comprises 105,090 images and 1,668,241 instances, offering diverse annotations for segmentation, instance segmentation, and object detection tasks.
  • Preliminary experiments indicate that pre-training with SAMRS improves RS segmentation performance, especially in data-sparse fine-tuning scenarios.

An Overview of SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model

The paper "SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model" addresses a significant challenge in the field of remote sensing (RS)—the creation and scaling of segmentation datasets. The primary obstacle in developing these datasets is the labor-intensive and expensive process of annotating RS images at the pixel level. This paper leverages the distinct advantages of the Segment Anything Model (SAM) in conjunction with existing RS object detection datasets to propose an efficient methodology for generating large-scale RS segmentation datasets.

Data-centric Approach with SAM

The success of SAM highlights data-centric strategies in machine learning. SAM demonstrates a considerable zero-shot segmentation capability, successfully processing RS images despite the varied conditions compared to its training data consisting of natural images. This observation drives the core idea of the paper: leveraging SAM to annotate unlabeled RS datasets and efficiently construct a robust segmentation dataset named SAMRS.

The SAMRS Dataset

SAMRS comprises 105,090 images and 1,668,241 instances, eclipsing existing high-resolution RS segmentation datasets by several magnitudes in size. The dataset retains critical object-level details—including category, location, and instance information—facilitating its use across semantic segmentation, instance segmentation, and object detection tasks.

The authors analyzed several prompt types to find optimal configurations for SAM application in RS annotation. The results, encapsulated in SAMRS, offer more diverse category representation and finer annotations compared to existing benchmarks. The capacity for multiplying the dataset's volume without increasing annotation overhead is a significant achievement, made possible through SAM's application.

Preliminary Experiments and Implications

Preliminary experiments carried out in this paper underpin the utility of pre-training segmentation networks with the SAMRS, particularly for addressing task discrepancies traditionally tackled by transferring weights from classification-pretrained models. This aspect is crucial given the noted gaps in performance when models adapted for classification are directly applied to RS segmentation tasks without considering task-specific pre-training.

The paper's analysis demonstrates that segmentation pre-training with SAMRS can mitigate the challenges posed by task discrepancies, thereby unlocking the potential for enhanced performance in RS segmentation applications. Notably, the authors' findings reveal significant improvements in scenario-limited fine-tuning conditions, indicating the effectiveness of pre-training in contexts where annotated training data is sparse.

Future Directions

The paper presents several avenues for further research. Future work could involve evaluating larger models pretrained on SAMRS, which might offer insights into the scalable application of SAM for even broader RS tasks. Additionally, exploring the impact of SAMRS on instance segmentation and object detection will help ascertain its broader applicability and validate the model transferability hypotheses posited in this paper.

Moreover, since the SAMRS dataset demonstrates efficacy in mitigating the training data limitations inherent in RS segmentation, its methodology might apply to broader domains within Earth observation and beyond. Thus, the scalability of the technique to other specialized imaging disciplines could be a significant focal point for future investigations.

Conclusion

In sum, the paper delivers a pragmatic and effective method for scaling RS segmentation datasets via SAM application. It significantly contributes to reducing annotation costs and enhances dataset characteristics, which are pivotal in advancing RS image analysis. By addressing task-specific challenges through intelligent pre-training strategies, the proposed approach sets a new pathway for improving the performance and adaptability of segmentation models in remote sensing and allied fields.