Vision Foundation Models in Remote Sensing: A Survey (2408.03464v2)

Published 6 Aug 2024 in cs.CV and cs.LG

Abstract: AI technologies have profoundly transformed the field of remote sensing, revolutionizing data collection, processing, and analysis. Traditionally reliant on manual interpretation and task-specific models, remote sensing research has been significantly enhanced by the advent of foundation models-large-scale, pre-trained AI models capable of performing a wide array of tasks with unprecedented accuracy and efficiency. This paper provides a comprehensive survey of foundation models in the remote sensing domain. We categorize these models based on their architectures, pre-training datasets, and methodologies. Through detailed performance comparisons, we highlight emerging trends and the significant advancements achieved by those foundation models. Additionally, we discuss technical challenges, practical implications, and future research directions, addressing the need for high-quality data, computational resources, and improved model generalization. Our research also finds that pre-training methods, particularly self-supervised learning techniques like contrastive learning and masked autoencoders, remarkably enhance the performance and robustness of foundation models. This survey aims to serve as a resource for researchers and practitioners by providing a panorama of advances and promising pathways for continued development and application of foundation models in remote sensing.

Citations (2)

View on Semantic Scholar

Summary

The paper provides a comprehensive review of AI foundation models, categorizing them for computer vision and domain-specific remote sensing tasks.
It rigorously compares performance metrics like mAP, mF1, and F1 scores in tasks such as scene classification, segmentation, detection, and change detection.
The study identifies key challenges including data quality and computational costs while proposing strategies for efficient model development and multi-modal integration.

Overview of Foundation Models in Remote Sensing

The paper "AI Foundation Models in Remote Sensing: A Survey" by Siqi Lu et al. provides a detailed examination of the advancements and applications of foundation models in remote sensing. This survey covers models proposed between June 2021 and June 2024, categorizing them based on their usage in computer vision and domain-specific tasks. The paper evaluates various models, compares their performance metrics, and highlights future research objectives in the domain of remote sensing.

Key Contributions

The paper enumerates the following principal contributions:

Comprehensive Review: It presents a thorough review of foundation models in remote sensing, detailing their architectures, pre-training datasets, and methodologies. Models are categorized hierarchically, enhancing the clarity and accessibility of the insights provided.
Categorization and Analysis: The models are categorized based on applications in computer vision tasks (e.g., scene classification, semantic segmentation, object detection, and change detection) and domain-specific tasks like environmental monitoring, agriculture, urban planning, disaster management, and archaeology.
Challenges and Future Directions: The paper discusses the significant challenges in adopting foundation models for remote sensing, such as the need for high-quality data, computational resource demands, and improving model generalization. Future research directions to address these challenges are proposed.

Foundation Models for Computer Vision Tasks

Scene Classification

Scene classification involves categorizing satellite images into predefined categories like urban, forest, and agricultural areas. Notable models include:

SkySense: Achieves an mAP of 92.09% on the BigEarthNet dataset, demonstrating superior classification ability.
msGFM: Shows an mAP of 92.90%, indicating exceptional accuracy in scene classification tasks.
DINO-MC and DeCUR: These models also perform strongly in classification tasks, with mAP scores around 89.70%.

Semantic Segmentation

Semantic segmentation tasks involve classifying each pixel in an image to generate detailed maps of land cover:

SkySense: Attains a remarkable mF1 Score of 93.99% on the ISPRS Potsdam dataset.
CMID: Leads in mIoU performance with a score of 87.04%.
BFM: Scores highest in overall accuracy (OA) with 91.82%.

Object Detection

In object detection, the models focus on identifying and locating objects within satellite images:

RVSA: Achieves the highest mAP of 81.24% on the DOTA dataset.
MTP and SkySense: Deliver strong performance in object detection tasks on the DIOR and DIOR-R datasets, with AP50 scores of 78% and an mAP of 78.73%, respectively.

Change Detection

Change detection models identify changes in the Earth's surface over time:

SkySense: Stands out with the highest F1 Score of 60.06% on the OSCD dataset.
GFM: Demonstrates strong performance with an F1 Score of 59.82%.
MTP: Excels in the LEVIR-CD dataset with an F1 Score of 92.67%.

Foundation Models for Domain-Specific Tasks

Environmental Monitoring

GASSL and SatMAE: These models enhance the monitoring of environmental changes, aiding conservation efforts and policy-making through detailed assessments of deforestation, desertification, and pollution levels.

Agriculture and Forestry

EarthPT and GeCo: Provide substantial insights into crop health, yield predictions, and optimal land use management, thus optimizing agricultural practices.

Archaeology

GeoKR and RingMo: Revolutionize archaeological surveys by detecting and mapping features like ruins and artifacts from satellite imagery, aiding in efficient site identification.

Urban Planning and Development

CMID and SkySense: Facilitate sustainable urban growth by monitoring urban expansion, infrastructure development, and land use changes, aiding in effective urban planning.

Disaster Management

OFA-Net, DOFA, and Prithvi: Offer critical real-time data for disaster response, aiding in flood mapping, fire detection, and quick damage estimation to support timely and effective response measures.

Future Directions

Future research should prioritize the following:

Efficient Model Development: Focus on techniques like model distillation, pruning, and quantization to reduce computational requirements.
Multi-Modal Data Integration: Enhance the integration of multi-modal data for comprehensive insights.
Interdisciplinary Collaboration: Encourage collaboration between remote sensing experts, AI researchers, and domain specialists to address complex challenges.

Conclusion

The survey by Siqi Lu et al. provides an in-depth examination of foundation models in remote sensing, highlighting significant advancements and identifying future research directions. With various models demonstrating superior performance in classification, segmentation, detection, and change detection, the paper underscores the transformative potential of AI in remote sensing across multiple domains. Further research in efficient model development, multi-modal data integration, and interdisciplinary collaboration will be crucial in addressing existing challenges and enhancing the application of foundation models in remote sensing.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1821669636708168137

https://twitter.com/momiji_fullmoon/status/1844641740889755916

https://twitter.com/valeriomarsocci/status/1826539264965787962

https://twitter.com/geonumist/status/1848748592368058567