RSMamba: Remote Sensing Image Classification with State Space Model (2403.19654v1)

Published 28 Mar 2024 in cs.CV

Abstract: Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of Convolutional Neural Networks (CNNs) and Transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity and diversity of remote sensing scenarios and the variability of spatiotemporal resolutions. The capacity for whole-image understanding can provide more precise semantic cues for scene discrimination. In this paper, we introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the State Space Model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. It integrates the advantages of both a global receptive field and linear modeling complexity. To overcome the limitation of the vanilla Mamba, which can only model causal sequences and is not adaptable to two-dimensional image data, we propose a dynamic multi-path activation mechanism to augment Mamba's capacity to model non-causal data. Notably, RSMamba maintains the inherent modeling mechanism of the vanilla Mamba, yet exhibits superior performance across multiple remote sensing image classification datasets. This indicates that RSMamba holds significant potential to function as the backbone of future visual foundation models. The code will be available at \url{https://github.com/KyanChen/RSMamba}.

References (16)

Authors (6)

Keyan Chen (34 papers)
Bowen Chen (50 papers)
Chenyang Liu (26 papers)
Wenyuan Li (47 papers)
Zhengxia Zou (52 papers)
Zhenwei Shi (77 papers)

Citations (62)

View on Semantic Scholar

Summary

RSMamba: Remote Sensing Image Classification with State Space Model

The paper presents a novel architecture, RSMamba, designed for remote sensing image classification utilizing the State Space Model (SSM). Remote sensing image classification is essential for earth observation applications such as land mapping and urban planning. However, it suffers from complexities related to spatio-temporal resolution variations and the diversity of scenarios. While CNNs and Transformers have advanced the field, their inherent limitations motivate the exploration of alternative architectures. RSMamba seeks to integrate CNNs' linear complexity with the global receptive fields characteristic of Transformers.

Key Contributions

Introduction of RSMamba: RSMamba is structured around the State Space Model and improved by a hardware-efficient design called Mamba. It is characterized by linear modeling complexity, making it computationally efficient while capable of handling large-scale remote sensing image classification.
Dynamic Multi-Path Activation: To overcome the inherent limitations of the original Mamba, which is suited for unidirectional and causal sequence modeling, RSMamba integrates a dynamic multi-path activation mechanism. This allows RSMamba to model non-causal, position-sensitive data, thus enhancing its ability to process two-dimensional image data effectively.
Performance and Efficiency: Through comprehensive evaluations on datasets such as UC Merced, AID, and RESISC45, RSMamba demonstrated superior performance relative to contemporary CNN and Transformer-based methods. The approach significantly balances accuracy with resource efficiency, making it promising for future visual foundation models.

Methodology Overview

RSMamba efficiently processes remote sensing images by transforming them into one-dimensional sequences and capturing long-range dependencies using an innovative Multi-Path SSM Encoder. The method begins by converting images into patches and adding positional encodings, then employs dynamic pathing to model relationships effectively. The multi-path mechanism involves processing sequences in forward, reverse, and shuffled orders, enhancing the model's capacity to understand spatial structures.

Numerical Results and Analysis

RSMamba's empirical performance is detailed across various datasets where it consistently outperforms traditional architectures. For instance, the base version of RSMamba, with significantly fewer parameters, outperformed ResNet, DeiT, and ViT models, achieving F1 scores of 93.88, 91.66, and 94.84 on UC Merced, AID, and RESISC45 datasets, respectively. Such results substantiate the hypothesis that RSMamba's efficient, global modeling strategy offers substantial advantages in remote sensing image classification.

Implications and Future Directions

The findings of RSMamba have significant implications both practically and theoretically. Practically, the model's ability to integrate CNN-like efficiency with Transformer's global understanding opens avenues for deployment in computationally constrained environments, while still achieving high accuracy. Theoretically, it suggests that further exploration into state space models in computer vision could yield architectures with improved performance and efficiency.

Future research could extend RSMamba by exploring additional modalities of remote sensing data, such as hyperspectral and SAR imagery, or by incorporating it into broader remote sensing tasks like object detection and change detection. There is also potential for integrating RSMamba with existing large-scale modeling paradigms, thereby enhancing its versatility and impact on practical applications.

Thus, RSMamba represents a significant advancement in remote sensing image classification by effectively blending state space models with contemporary hardware-aware design, paving the way for more accurate and efficient processing of complex visual data.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - KyanChen/RSMamba: This is the pytorch implement of the paper "RSMamba: Remote Sensing Image Classification with State Space Model" (191 stars)

Tweets

https://twitter.com/gm8xx8/status/1773527463466799344