Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images (2309.01429v4)

Published 4 Sep 2023 in cs.CV

Abstract: Vision Foundation Models (VFMs) such as the Segment Anything Model (SAM) allow zero-shot or interactive segmentation of visual contents, thus they are quickly applied in a variety of visual scenes. However, their direct use in many Remote Sensing (RS) applications is often unsatisfactory due to the special imaging characteristics of RS images. In this work, we aim to utilize the strong visual recognition capabilities of VFMs to improve the change detection of high-resolution Remote Sensing Images (RSIs). We employ the visual encoder of FastSAM, an efficient variant of the SAM, to extract visual representations in RS scenes. To adapt FastSAM to focus on some specific ground objects in the RS scenes, we propose a convolutional adaptor to aggregate the task-oriented change information. Moreover, to utilize the semantic representations that are inherent to SAM features, we introduce a task-agnostic semantic learning branch to model the semantic latent in bi-temporal RSIs. The resulting method, SAMCD, obtains superior accuracy compared to the SOTA methods and exhibits a sample-efficient learning ability that is comparable to semi-supervised CD methods. To the best of our knowledge, this is the first work that adapts VFMs for the CD of HR RSIs.

References (46)

Citations (49)

View on Semantic Scholar

Summary

The paper introduces SAM-CD, a framework that adapts FastSAM with a convolutional adaptor to enhance change detection accuracy in VHR remote sensing imagery.
It incorporates a semantic learning branch to discern true semantic changes from natural scene variations using bi-temporal image analysis.
Experiments demonstrate that SAM-CD outperforms traditional fully-supervised methods in precision, recall, and F1 scores while reducing the need for extensive annotations.

Adapting Vision Foundation Models for Enhanced Change Detection in Remote Sensing

The paper "Adapting Segment Anything Model for Change Detection in VHR Remote Sensing Images" presents a novel approach for improving change detection (CD) in very high-resolution (VHR) remote sensing images. The researchers leverage the capabilities of Vision Foundation Models (VFMs), particularly the Segment Anything Model (SAM) and its variant, FastSAM, to improve the accuracy and efficiency of CD tasks in remote sensing, a domain characterized by unique imaging properties that challenge traditional vision models trained on natural scenes.

Context and Motivation

CD is a crucial task in remote sensing that involves identifying and segmenting changes between multi-temporal images of the same geographic area. This capability is instrumental in environmental monitoring, urban development, and disaster management. Traditional methods often rely on aligning changes observed across images using Convolutional Neural Networks (CNNs) and, more recently, Vision Transformers (ViTs). However, these methods typically require substantial annotated data and struggle with semantic distinctions in change detection, often conflating temporal differences with semantic alterations.

Methodology

The authors propose the SAM-CD, a change detection framework that adapts FastSAM for remote sensing applications. FastSAM is employed for its robust visual representation capabilities, capturing the semantics of ground objects in remote sensing images through a convolutional adaptor tailored for task-oriented learning. The distinct approach involves:

FastSAM Adaptation: Implementing a convolutional adaptor to refine the FastSAM-derived features for more accurate detection of semantically relevant changes. The multi-scale features extracted by FastSAM are aggregated and processed through this adaptor to align them with the task objectives, overcoming the inductive biases of models trained primarily on natural images.
Semantic Learning Branch: A task-agnostic semantic learning branch is introduced, shifting the CD model focus from purely detecting image differences to understanding the semantic latent within multi-temporal remote sensing images. This branch exploits temporal constraints, leveraging bi-temporal image semantics to more effectively discern between natural scene variations and true semantic changes.

Results and Implications

The SAM-CD model achieves superior performance over state-of-the-art fully-supervised methods and demonstrates a sample-efficient learning process akin to semi-supervised methods. Experimental validation across multiple datasets shows significant increases in accuracy metrics such as precision, recall, and F1 scores, underscoring the effectiveness of integrating VFMs with dedicated adaptors for domain-specific applications.

The findings suggest significant implications for the field of remote sensing and image analysis. By reducing the dependency on extensive labeled datasets, SAM-CD opens pathways for more efficient and practical applications in scenarios where large-scale annotations are impractical or impossible. Furthermore, the approach highlights the potential for VFMs to be fine-tuned or adapted to niche domains beyond their original training scope, providing a flexible foundation for future advancements.

Future Directions

Future work could explore additional VFMs, further refine the interplay between semantic and change features, and extend the adaptation strategies to other challenging domains such as medical imaging. With continued advancement, the role of VFMs in semantically rich, change-detection tasks will likely expand, potentially reshaping methodologies in fields that require precise temporal-spatial analysis. The integration of VFMs with domain-specific learning branches remains a promising direction for ongoing research and development.

PDF Markdown

GitHub

GitHub - ggsDing/SAM-CD: Pytorch code of the SAM-CD (181 stars)