Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images (2011.14302v2)

Published 29 Nov 2020 in cs.CV

Abstract: The attention mechanism can refine the extracted feature maps and boost the classification performance of the deep network, which has become an essential technique in computer vision and natural language processing. However, the memory and computational costs of the dot-product attention mechanism increase quadratically with the spatio-temporal size of the input. Such growth hinders the usage of attention mechanisms considerably in application scenarios with large-scale inputs. In this Letter, we propose a Linear Attention Mechanism (LAM) to address this issue, which is approximately equivalent to dot-product attention with computational efficiency. Such a design makes the incorporation between attention mechanisms and deep networks much more flexible and versatile. Based on the proposed LAM, we re-factor the skip connections in the raw U-Net and design a Multi-stage Attention ResU-Net (MAResU-Net) for semantic segmentation from fine-resolution remote sensing images. Experiments conducted on the Vaihingen dataset demonstrated the effectiveness and efficiency of our MAResU-Net. Open-source code is available at https://github.com/lironui/Multistage-Attention-ResU-Net.

Authors (5)

Rui Li (384 papers)
Shunyi Zheng (8 papers)
Chenxi Duan (15 papers)
Jianlin Su (31 papers)
Ce Zhang (215 papers)

Citations (156)

View on Semantic Scholar

Summary

Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images

This paper presents a novel approach for semantic segmentation of fine-resolution remote sensing images, introducing the Multi-stage Attention ResU-Net (MAResU-Net). The key innovation of this research is the Linear Attention Mechanism (LAM), which efficiently reduces the computational complexity associated with the dot-product attention mechanism. Such reduction from $O(N^2)$ to $O(N)$ marks a significant advancement, facilitating the processing of large-scale inputs without compromising classification performance.

Methodology

The proposed LAM modifies the standard dot-product attention using the first-order approximation of Taylor expansion. This adjustment ensures computational efficiency, allowing attention mechanisms to model dependencies on large inputs such as fine-resolution images. The integration of LAM into the U-Net architecture, enhanced with ResNet-based backbones, forms the crux of the MAResU-Net. The design leverages attention blocks at multiple stages, refining feature maps across various scales and improving the network's semantic segmentation capabilities.

Results

The performance of MAResU-Net was evaluated on the Vaihingen dataset, demonstrating superior results over existing methods, including U-Net, ResUNet-a, PSPNet, and DANet. The results indicate a marked improvement with the highest mean F1-score of 90.277%, overall accuracy (OA) of 90.860%, and mean Intersection over Union (mIoU) of 83.301%. These figures highlight the capability of the proposed architecture to capture refined and fine-grained features within remote sensing images.

Furthermore, statistical analysis using Kappa z-tests indicates a significant improvement in classification performance, validating the robustness of the MAResU-Net over comparative methods. The incorporation of attention blocks substantiates the efficacy of multi-stage attention strategies, particularly at lower-level feature extraction, contributing significantly to the network's enhanced performance.

Implications and Future Directions

The introduction of LAM offers substantial implications for the development of efficient deep learning models in computer vision, particularly for applications involving large-scale and fine-resolution inputs. The reduction in computational complexity broadens the applicability of attention mechanisms within networks, potentially influencing future architectures in medical imaging, land cover classification, and beyond.

Future research could explore further optimizations of LAM to enable even more efficient handling of ultra-high-resolution imagery, along with adapting similar methodologies to other domains such as video processing and long-sequence modeling in NLP. Expanding upon the multi-stage attention approach presents opportunities for additional enhancements to capture and leverage contextual data effectively. The open-source nature of the MAResU-Net allows for community-driven advancements, encouraging collaborative improvement within the research community.

PDF Markdown

Related Papers

GitHub

GitHub - lironui/MAResU-Net: The semantic segmentation of remote sensing images (45 stars)