SpectralGPT: Spectral Remote Sensing Foundation Model (2311.07113v3)

Published 13 Nov 2023 in cs.CV

Abstract: The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.

Citations (262)

View on Semantic Scholar

Summary

The paper proposes SpectralGPT, a groundbreaking model that uses a 3D generative pretrained transformer for efficient spectral remote sensing image analysis.
It employs a progressive training approach and innovative 3D tokenization to integrate spatial and spectral data seamlessly.
Benchmark tests demonstrate near-perfect scene classification and superior performance in semantic segmentation and change detection tasks.

SpectralGPT: Spectral Remote Sensing Foundation Model

The paper "SpectralGPT: Spectral Remote Sensing Foundation Model" addresses the gap in foundation models for remote sensing, specifically designed for spectral data. Current models, primarily designed for RGB imagery, lack the ability to fully exploit the rich spectral information available in remote sensing (RS) data. The authors propose SpectralGPT, a universal RS foundation model leveraging a novel 3D generative pretrained transformer (GPT) architecture to handle spectral RS images efficiently.

SpectralGPT offers several distinct advantages over existing models:

Progressive Training and Scalability: SpectralGPT can handle input images in vast arrays of sizes, resolutions, time series, and geographical regions. This is achieved through a progressive training approach, making full use of extensive RS big data.
3D Tokenization and Modeling: The model innovatively employs 3D token generation, allowing it to seamlessly couple spatial and spectral information.
Multi-target Reconstruction: SpectralGPT captures spectrally sequential patterns, which are essential for accurate scene understanding in remote sensing applications.

Pretrained on one million spectral RS images, the resulting models contain over 600 million parameters, demonstrating a substantial capacity for representation learning. SpectralGPT's efficacy is evaluated across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection. The results indicate a significant improvement over existing state-of-the-art (SOTA) methods.

Numerical Results and Bold Claims

SpectralGPT's evaluation showcases its capability to outperform current models substantially. In a single-label RS scene classification task, SpectralGPT achieved a remarkable accuracy of 99.21% on the EuroSAT dataset. In multi-label scene classification using the BigEarthNet-S2 dataset, the model exhibited macro and micro mean average precision (mAP) scores of 88.22% and 87.50%, respectively. These results represent a notable improvement over models like ViT and SatMAE.

For semantic segmentation, SpectralGPT showed a significant increase in performance metrics, achieving an overall accuracy (OA) of 82.7% and a mean intersection over union (mIoU) of 51.0% on the SegMunich dataset. Similarly, the model demonstrated superior change detection capabilities with a F1 score of 54.29% on the OSCD dataset.

Implications and Future Developments

The introduction of SpectralGPT marks a significant advancement in the field of remote sensing and geoscience. By leveraging comprehensive spectral data, SpectralGPT facilitates more accurate and efficient processing of RS big data, paving the way for enhanced Earth Observation (EO) applications, such as ecosystem monitoring and geological exploration. The novel utilization of 3D GPT architecture in handling spectral data may lead to improved models in a variety of related fields, dramatically increasing the value of spectral RS data in practical applications.

In future work, further expansion of the training dataset to include a broader range of spectral variations and other types of RS data could enhance model robustness. The potential incorporation of additional modalities, such as temporal or thermal data, may further augment the capabilities of the SpectralGPT framework. Expanding the application of SpectralGPT to other domains could unlock significant insights and foster progress across various fields in artificial intelligence, geoscience, and beyond.

PDF Markdown

Related Papers

Tweets

https://twitter.com/GhamisiPedram/status/1758788441653879052