BAM: Bottleneck Attention Module (1807.06514v2)

Published 17 Jul 2018 in cs.CV

Abstract: Recent advances in deep neural networks have been developed via architecture search for stronger representational power. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available.

View on arXiv

Authors (4)

Jongchan Park (21 papers)
Sanghyun Woo (31 papers)
Joon-Young Lee (61 papers)
In So Kweon (156 papers)

Citations (950)

View on Semantic Scholar

Summary

Bottleneck Attention Module (BAM): Enhancing Neural Network Representational Power

"Bottleneck Attention Module" (BAM) by Jongchan Park et al. introduces a novel attention mechanism tailored to improve the representational capabilities of convolutional neural networks (CNNs). This paper's primary contribution, BAM, is a modular, lightweight attention mechanism designed to be seamlessly integrated into existing CNN architectures. It introduces hierarchical attention at key points within a network to enhance performance across several standard image recognition benchmarks.

Core Contributions

The key contributions of this paper include:

Bottleneck Attention Module (BAM): A versatile module that can be incorporated into any feed-forward CNN architecture. BAM generates an attention map focusing on both channel and spatial information, refining intermediate feature maps efficiently.
Hierarchical Attention Integration: BAM is strategically placed at network bottlenecks where downsampling occurs, ensuring crucial information is amplified or suppressed at critical network stages.
Extensive Experimental Validation: The efficacy of BAM is confirmed through rigorous experimentation on CIFAR-100, ImageNet-1K, VOC 2007, and MS COCO datasets, demonstrating performance improvements in both classification and detection tasks.

Methodology

Structure of BAM

BAM employs a dual-pathway mechanism to infer attention:

Channel Attention: Aggregates global information across channels by applying global average pooling followed by a multi-layer perceptron (MLP) to generate a channel-wise attention map.
Spatial Attention: Utilizes dilated convolutions to capture contextual information across spatial dimensions, generating a spatial attention map.

Both pathways produce attention maps that are combined via element-wise summation, followed by a sigmoid activation to generate a refined 3D attention map. This attention map is then used to enhance or suppress features and is residual-connected to the original feature map to facilitate gradient flow.

Key Design Choices and Ablations

BAM's design choices, such as the use of both channel and spatial pathways and the specific combining strategy (element-wise summation), are validated through extensive ablation studies. Results show:

The combination of channel and spatial attention significantly enhances performance compared to using either one independently.
Hierarchical attention mechanism, facilitated by placing BAM at network bottlenecks, provides an optimal trade-off between accuracy and computational overhead.

Empirical Results

Image Classification Tasks

On CIFAR-100, BAM integrated with various state-of-the-art architectures (e.g., ResNet, WideResNet, ResNeXt) consistently outperforms baseline models without adding significant computational overhead. For instance, ResNet50 with BAM achieves an error rate of 20.0% compared to 21.49% without BAM, while maintaining a relatively small increase in parameters and FLOPs.

On ImageNet-1K, similar performance improvements are observed. For example, ResNet101 with BAM achieves a Top-1 error of 22.44%, an appreciable improvement over the baseline ResNet101 at 23.38%.

Object Detection Tasks

BAM's applicability extends to object detection. On the MS COCO dataset, integrating BAM with standard detection pipelines like Faster-RCNN leads to improved mean average precision (mAP) scores. Similar gains are observed on the VOC 2007 dataset using StairNet.

Comparisons and Efficiency

BAM outperforms comparable attention mechanisms such as Squeeze-and-Excitation (SE) modules in both accuracy and parameter efficiency. The empirical results underline BAM's ability to effectively refine features without the need to significantly increase model complexity.

Practical and Theoretical Implications

Practically, BAM's lightweight, modular design makes it particularly suitable for deployment in resource-constrained environments like mobile and embedded systems. Theoretically, the introduction of hierarchical attention at bottlenecks presents a novel approach to attention in neural networks, suggesting that strategic placement of attention modules can significantly enhance information processing within deep networks.

Conclusion and Future Directions

This paper provides a compelling case for the use of BAM as an effective, efficient means of incorporating attention in CNNs. The hierarchical approach to attention, coupled with the extensive validation across multiple datasets and tasks, confirms BAM's robustness and versatility. Future research may involve exploring BAM's integration with other types of neural networks and extending its application to additional vision and non-vision tasks. The insights gained from BAM could also inspire new attention mechanisms and strategies for enhancing network performance.

By refining intermediate features at critical network points, BAM represents a significant step towards more adaptive and powerful neural network architectures.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos