EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network (2105.14447v2)

Published 30 May 2021 in cs.CV

Abstract: Recently, it has been demonstrated that the performance of a deep convolutional neural network can be effectively improved by embedding an attention module into it. In this work, a novel lightweight and effective attention method named Pyramid Squeeze Attention (PSA) module is proposed. By replacing the 3x3 convolution with the PSA module in the bottleneck blocks of the ResNet, a novel representational block named Efficient Pyramid Squeeze Attention (EPSA) is obtained. The EPSA block can be easily added as a plug-and-play component into a well-established backbone network, and significant improvements on model performance can be achieved. Hence, a simple and efficient backbone architecture named EPSANet is developed in this work by stacking these ResNet-style EPSA blocks. Correspondingly, a stronger multi-scale representation ability can be offered by the proposed EPSANet for various computer vision tasks including but not limited to, image classification, object detection, instance segmentation, etc. Without bells and whistles, the performance of the proposed EPSANet outperforms most of the state-of-the-art channel attention methods. As compared to the SENet-50, the Top-1 accuracy is improved by 1.93% on ImageNet dataset, a larger margin of +2.7 box AP for object detection and an improvement of +1.7 mask AP for instance segmentation by using the Mask-RCNN on MS-COCO dataset are obtained. Our source code is available at:https://github.com/murufeng/EPSANet.

Citations (175)

View on Semantic Scholar

Summary

The paper introduces the novel Pyramid Squeeze Attention (PSA) module that efficiently extracts multi-scale features and enhances channel dependencies in CNNs.
It demonstrates that embedding the PSA into ResNet to form the EPSA block yields a 1.93% Top-1 accuracy boost on ImageNet over SENet-50.
The study highlights EPSANet's scalability and versatility, improving tasks like image classification, object detection, and instance segmentation while keeping computational costs low.

Overview of EPSANet: An Efficient Pyramid Squeeze Attention Block on Convolutional Neural Network

The paper introduces EPSANet—a novel backbone architecture designed to enhance the performance of convolutional neural networks (CNNs) through a lightweight and efficient attention module called the Pyramid Squeeze Attention (PSA). By embedding this module into the bottleneck blocks of ResNet, the paper details how it is possible to create a new representational unit named the Efficient Pyramid Squeeze Attention (EPSA) block. The primary objective is to offer a scalable and plug-and-play component that improves the multi-scale representation capability of network architectures across various computer vision tasks such as image classification, object detection, and instance segmentation.

Key Contributions

Pyramid Squeeze Attention (PSA) Module: The PSA module is the cornerstone of the proposed architecture. It efficiently processes input tensors at multiple scales, extracting diverse spatial information while establishing long-range channel dependencies. This method capitalizes on multi-scale pyramid convolution structures and cross-dimension interaction to yield a refined feature map rich in contextual information.
Efficient Pyramid Squeeze Attention (EPSA) Block: By integrating the PSA module into ResNet, the EPSA block is created. This block is both flexible and scalable and can be seamlessly integrated into existing network architectures, enhancing performance while maintaining a low-cost profile in terms of computational resources.
Significant Performance Improvements: Extensive experiments demonstrate that EPSANet outperforms various state-of-the-art attention mechanisms like SE, CBAM, and FcaNet. Notable improvements are reported, such as a 1.93% increase in Top-1 accuracy on the ImageNet dataset compared to SENet-50 and enhanced object detection and instance segmentation results on the MS COCO dataset.

Evaluation and Technical Details

Conducting evaluations on datasets such as ImageNet and MS COCO, the proposed EPSANet demonstrates significant performance gains across multiple metrics. For instance, in image classification, the EPSANet achieved a 78.64% Top-1 accuracy with a computational cost of 4.72 GFLOPs using the EPSANet(Large) architecture, surpassing existing models like SENet, which achieves 76.71%. In object detection tasks assessed through Faster RCNN and Mask RCNN, EPSANet consistently showed improvements, particularly in AP metrics, demonstrating its effectiveness over a range of object sizes and categories.

Implications and Future Directions

The compelling results of the EPSANet have broad implications. While enhancing performance with a noted computational efficiency is beneficial for high-demand computing environments, the flexible nature of the EPSA blocks also allows this approach to be considered in more constrained systems, potentially benefiting tasks requiring mobile or embedded systems where computational resources are limited. Given the scalability and adaptability of the PSA and EPSA blocks, exploring their integration into lightweight CNN architectures or other neural network models could be an area of future interest. Furthermore, leveraging these techniques in different domains, including real-time video processing and 3D computer vision, might open additional avenues for exploration and application.

The paper leaves the door open for continued innovation, centering on how attention mechanisms can be leveraged to advance neural network architectures while balancing performance and resource utilization.

Related Papers

GitHub

GitHub - murufeng/EPSANet (268 stars)