Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images (2009.02130v4)

Published 3 Sep 2020 in eess.IV and cs.CV

Abstract: Semantic segmentation of remote sensing images plays an important role in a wide range of applications including land resource management, biosphere monitoring and urban planning. Although the accuracy of semantic segmentation in remote sensing images has been increased significantly by deep convolutional neural networks, several limitations exist in standard models. First, for encoder-decoder architectures such as U-Net, the utilization of multi-scale features causes the underuse of information, where low-level features and high-level features are concatenated directly without any refinement. Second, long-range dependencies of feature maps are insufficiently explored, resulting in sub-optimal feature representations associated with each semantic class. Third, even though the dot-product attention mechanism has been introduced and utilized in semantic segmentation to model long-range dependencies, the large time and space demands of attention impede the actual usage of attention in application scenarios with large-scale input. This paper proposed a Multi-Attention-Network (MANet) to address these issues by extracting contextual dependencies through multiple efficient attention modules. A novel attention mechanism of kernel attention with linear complexity is proposed to alleviate the large computational demand in attention. Based on kernel attention and channel attention, we integrate local feature maps extracted by ResNeXt-101 with their corresponding global dependencies and reweight interdependent channel maps adaptively. Numerical experiments on three large-scale fine resolution remote sensing images captured by different satellite sensors demonstrate the superior performance of the proposed MANet, outperforming the DeepLab V3+, PSPNet, FastFCN, DANet, OCRNet, and other benchmark approaches.

Citations (312)

View on Semantic Scholar

Summary

The paper proposes MANet, a multi-attention framework that enhances semantic segmentation of high-resolution remote sensing images by integrating efficient attention modules.
It introduces a kernel attention mechanism with linear complexity that significantly reduces computational bottlenecks and memory usage.
Experimental evaluations on datasets like ISPRS Potsdam and Vaihingen reveal that MANet outperforms benchmarks in mIoU and F1-score metrics.

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

The paper entitled "Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images" by Rui Li et al. introduces a novel approach to address several limitations in the state-of-the-art methodologies applied for the semantic segmentation of high-resolution remote sensing imagery. The necessity for precise categorization of land cover, vital for applications such as resource management and urban planning, emphasizes the paper's contribution to improving segmentation accuracy using intricate attention mechanisms.

Key Contributions

Multi-Attention-Network (MANet): The paper proposes MANet, which integrates multiple efficient attention modules aimed at enhancing the extraction of contextual dependencies at multiple levels. It endeavors to resolve inadequacies in traditional encoder-decoder frameworks, such as U-Net, which often fail to fully utilize the multi-scale features due to simplistic concatenation of low-level and high-level features.
Kernel Attention with Linear Complexity: A novel kernel attention mechanism is introduced, achieving substantial reduction in computational demands associated with conventional dot-product attention. This linear complexity approach potentially alleviates processing and memory bottlenecks, enabling the handling of large-scale input data typical in fine-resolution remote sensing imagery.
Enhanced Backbone Architecture: By substituting the typical ResNet backbone with ResNeXt-101, MANet leverages an improved architecture that enhances feature extraction capabilities, permitting more refined dense feature acquisition crucial for semantic segmentation in challenging scenarios.
Channel and Kernel Attention Integration: The paper details the integration of channel and kernel attention mechanisms at various stages of the network to foster comprehensive context modeling, thus improving the feature representation related to each semantic class.

Experimental Evaluation

The effectiveness of the proposed MANet was demonstrated through extensive evaluations on multiple large-scale datasets, such as the ISPRS Potsdam, ISPRS Vaihingen, and the Gaofen Image Dataset (GID). MANet consistently outperformed several benchmark models including DeepLab V3+, PSPNet, FastFCN, DANet, and OCRNet. For instance, on the ISPRS Potsdam dataset, MANet achieved notable improvements in metrics like mean Intersection over Union (mIoU) and F1-score, surpassing the performance of existing high-performing networks such as DANet.

Implications and Future Directions

The implications of this work are multifaceted. Practically, MANet's design makes it highly suitable for diverse remote sensing applications requiring efficient processing and precise land cover classification for fine-resolution data. Theoretically, the successful implementation of a kernel-based attention mechanism paves the way for further explorations into computational efficiency in attention models. Future research could explore the application of such mechanisms in other domains, as well as the potential integration with other neural architectures to boost their segmentation performance.

In conclusion, the paper contributes a significant advancement in semantic segmentation by introducing MANet. It addresses key challenges in computational overhead and feature representation, thereby setting a robust precedent for future work in remote sensing and segmentation tasks alike.

PDF Markdown