Coordinate Attention for Efficient Mobile Network Design (2103.02907v1)

Published 4 Mar 2021 in cs.CV

Abstract: Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call "coordinate attention". Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes that aggregate features along the two spatial directions, respectively. In this way, long-range dependencies can be captured along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction. The resulting feature maps are then encoded separately into a pair of direction-aware and position-sensitive attention maps that can be complementarily applied to the input feature map to augment the representations of the objects of interest. Our coordinate attention is simple and can be flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt, and EfficientNet with nearly no computational overhead. Extensive experiments demonstrate that our coordinate attention is not only beneficial to ImageNet classification but more interestingly, behaves better in down-stream tasks, such as object detection and semantic segmentation. Code is available at https://github.com/Andrew-Qibin/CoordAttention.

Citations (2,452)

View on Semantic Scholar

Summary

The paper introduces a novel coordinate attention mechanism that preserves positional information within channel attention to enhance mobile network efficiency.
It employs dual one-dimensional pooling operations along vertical and horizontal directions to capture long-range dependencies and fine spatial details.
Experimental results reveal improvements on ImageNet, COCO, and Pascal VOC, confirming significant gains in accuracy, AP, and mIoU under constrained computational budgets.

Coordinate Attention for Efficient Mobile Network Design

Introduction

The paper "Coordinate Attention for Efficient Mobile Network Design" by Qibin Hou, Daquan Zhou, and Jiashi Feng proposes a novel attention mechanism designed to enhance the efficiency and effectiveness of mobile networks. The authors argue that while recent studies such as Squeeze-and-Excitation (SE) attention have significantly boosted the performance of mobile networks, they often overlook the importance of positional information. The proposed "coordinate attention" addresses this limitation by integrating positional information into the channel attention mechanism, capturing both cross-channel dependencies and long-range spatial information.

Methodology

The core innovation of this paper is the "coordinate attention" mechanism, which factorizes the traditional channel attention into two one-dimensional processes to maintain precise positional information. Here are the key steps involved:

Coordinate Information Embedding: Instead of using 2D global pooling, coordinate attention utilizes two separate 1D pooling operations that aggregate features along vertical and horizontal directions, respectively. This design captures long-range dependencies in one direction while preserving positional information in the other direction.
Coordinate Attention Generation: The aggregated feature maps are concatenated and passed through a shared $1 \times 1$ convolutional transformation. This is then split into two separate tensors, which are further transformed to generate direction-aware and position-sensitive attention maps. These maps are applied to the input feature via multiplication to emphasize the representations of interest.

The design ensures that coordinate attention captures the precise positional information crucial for vision tasks, thereby enhancing the model's ability to recognize and localize objects more accurately.

Experimental Results

The paper presents extensive experiments across several tasks to demonstrate the effectiveness of the coordinate attention mechanism. Here are some highlighted results:

Image Classification: Using MobileNetV2 as the baseline, coordinate attention achieved a 0.8% gain in top-1 accuracy on ImageNet, outperforming the SE attention and CBAM under the same computational constraints.
Object Detection: When integrated into MobileNetV2 and tested with the SSDLite320 detector on COCO and Pascal VOC datasets, coordinate attention showed significant improvements in AP metrics. For instance, on COCO, the model achieved a 24.5% AP, compared to 23.7% for SE and 23.0% for CBAM.
Semantic Segmentation: The advantage of coordinate attention was even more pronounced in tasks requiring dense predictions. On Pascal VOC 2012, models with coordinate attention achieved a 73.96% mIoU, compared to 72.52% for SE and 71.67% for CBAM.

Implications and Future Work

The results underscore the practical implications of coordinate attention across various computer vision tasks, particularly those involving mobile networks where computational efficiency is paramount. The ability to capture positional information alongside inter-channel dependencies makes coordinate attention a versatile and powerful tool for enhancing mobile network performance.

Theoretically, this research opens avenues for further exploration of attention mechanisms in constrained environments. Future work may delve into optimizing the reduction ratio further or integrating coordinate attention with other forms of architectural innovations to push the boundaries of mobile network capabilities.

Conclusion

The paper makes a substantial contribution by introducing coordinate attention, which integrates positional information into channel attention, effectively enhancing the performance of mobile networks in image classification, object detection, and semantic segmentation. These findings not only validate the potential of coordinate attention in practical applications but also encourage further research into efficient attention mechanisms. The versatility and low computational overhead make coordinate attention a valuable addition to the toolkit for mobile network designers.

PDF Markdown

Related Papers

GitHub

GitHub - houqb/CoordAttention: Code for our CVPR2021 paper coordinate attention (1,026 stars)