CDXLSTM: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory (2411.07863v3)

Published 12 Nov 2024 in cs.CV, cs.LG, and eess.IV

Abstract: In complex scenes and varied conditions, effectively integrating spatial-temporal context is crucial for accurately identifying changes. However, current RS-CD methods lack a balanced consideration of performance and efficiency. CNNs lack global context, Transformers are computationally expensive, and Mambas face CUDA dependence and local correlation loss. In this paper, we propose CDXLSTM, with a core component that is a powerful XLSTM-based feature enhancement layer, integrating the advantages of linear computational complexity, global context perception, and strong interpret-ability. Specifically, we introduce a scale-specific Feature Enhancer layer, incorporating a Cross-Temporal Global Perceptron customized for semantic-accurate deep features, and a Cross-Temporal Spatial Refiner customized for detail-rich shallow features. Additionally, we propose a Cross-Scale Interactive Fusion module to progressively interact global change representations with spatial responses. Extensive experimental results demonstrate that CDXLSTM achieves state-of-the-art performance across three benchmark datasets, offering a compelling balance between efficiency and accuracy. Code is available at https://github.com/xwmaxwma/rschange.

Summary

The paper introduces XLSTM to efficiently capture temporal and spatial contexts, yielding superior performance with linear computational complexity.
It combines scale-specific feature enhancement with cross-scale interactive fusion to improve both local detail and global semantic understanding.
Empirical results demonstrate state-of-the-art F1-scores on LEVIR-CD, WHU-CD, and CLCD while significantly reducing computational overhead.

Overview of CDXFormer: Advancements in Remote Sensing Change Detection

The paper "CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory" addresses a significant gap in the domain of Remote Sensing Change Detection (RS-CD) by overcoming limitations present in existing methods, namely CNNs, Transformers, and Mamba-based approaches. The authors present CDXFormer, a novel architecture that integrates an XLSTM-based feature enhancement layer to capture and model both spatial and temporal contexts efficiently.

Key Contributions and Methodological Innovations

Introduction of XLSTM: The paper pioneers the application of XLSTM in RS-CD tasks. Compared to traditional methods, XLSTM offers multiple advantages, including linear computational complexity, global context awareness, and enhanced interpretability. These properties allow for more efficient and intuitive modeling of change detection, offering a compelling alternative to CNNs' limited global context modeling and Transformers’ computational inefficiency due to quadratic complexity.
Scale-specific Feature Enhancer: CDXFormer incorporates a scale-specific Feature Enhancer layer comprising a Cross-Temporal Global Perceptron (CTGP) and a Cross-Temporal Spatial Refiner (CTSR). The CTGP focuses on semantic accuracy in low-resolution branches, enhancing semantic differences. Meanwhile, the CTSR is tailored for the high-resolution branches to refine spatial details, a crucial factor in precise change localization.
Cross-Scale Interactive Fusion (CSIF) Module: A notable component of the architecture is the CSIF module, which facilitates the integration of global semantic changes with spatial information across scales. This module ensures that spatial detail preservation is prioritized in high-resolution change representations while enriching global semantic understanding from lower resolutions.

Numerical Results and Comparative Performance

The CDXFormer model has been rigorously tested against three benchmark RS-CD datasets: LEVIR-CD, WHU-CD, and CLCD. The results evidence its superior performance, achieving state-of-the-art results with an F1-score improvement across the datasets as indicated:

LEVIR-CD Dataset: CDXFormer scored an F1 of 90.89.
WHU-CD Dataset: Achieved an F1 of 92.58, outperforming existing solutions by significant margins.
CLCD Dataset: With an F1 of 78.73, the model showcases its ability to handle complex scene changes effectively.

These outcomes are accompanied by reduced computational overhead, where CDXFormer maintains a balance between high accuracy and computational efficiency, marked by a relatively low parameter count (16.19M) and Flops (3.92G).

Theoretical Implications and Future Prospects

The introduction of XLSTM into the RS-CD paradigm not only compensates for the shortcomings of current methods but also opens avenues for further exploration regarding its adaptability and potential for improvement in other domains of computer vision and remote sensing. The paper suggests that future research could focus on developing lighter cross-temporal XLSTM architectures to enhance performance and efficiency even further.

Conclusion

The CDXFormer presents a substantial stride forward in remote sensing change detection technology by leveraging advanced LSTM variants. Its effective balance between capturing global contexts and maintaining operational efficiency sets a new standard in RS-CD methodologies and highlights the potential for broader applications. This work underscores the adaptability of LSTM methodologies in addressing challenges posed by complex scene changes, and sets the stage for future developments in AI and remote sensing that leverage similar methodologies.

PDF Markdown

Related Papers

GitHub

GitHub - xwmaxwma/rschange: Change detection of remote sensing images (52 stars)

Tweets

https://twitter.com/gm8xx8/status/1856576709790679140