Attentional Feature Fusion (2009.14082v2)

Published 29 Sep 2020 in cs.CV

Abstract: Feature fusion, the combination of features from different layers or branches, is an omnipresent part of modern network architectures. It is often implemented via simple operations, such as summation or concatenation, but this might not be the best choice. In this work, we propose a uniform and general scheme, namely attentional feature fusion, which is applicable for most common scenarios, including feature fusion induced by short and long skip connections as well as within Inception layers. To better fuse features of inconsistent semantics and scales, we propose a multi-scale channel attention module, which addresses issues that arise when fusing features given at different scales. We also demonstrate that the initial integration of feature maps can become a bottleneck and that this issue can be alleviated by adding another level of attention, which we refer to as iterative attentional feature fusion. With fewer layers or parameters, our models outperform state-of-the-art networks on both CIFAR-100 and ImageNet datasets, which suggests that more sophisticated attention mechanisms for feature fusion hold great potential to consistently yield better results compared to their direct counterparts. Our codes and trained models are available online.

Citations (538)

View on Semantic Scholar

Summary

The paper presents AFF and iAFF, new methods that dynamically fuse CNN features using multi-scale attention.
It utilizes a multi-scale channel attention module to align semantic and scale information across feature maps.
Empirical results demonstrate improved accuracy with fewer parameters on CIFAR-100 and ImageNet benchmarks.

Attentional Feature Fusion: An Overview

The paper "Attentional Feature Fusion" explores a novel approach to feature fusion in convolutional neural networks (CNNs) through a generalizable and adaptive mechanism known as attentional feature fusion (AFF). The motivation for this work arises from the limitations of traditional feature fusion methods, which typically employ simple operations like addition or concatenation without considering the semantic and scale inconsistencies among the features being combined.

Key Contributions

The research makes several significant contributions:

Attentional Feature Fusion (AFF): AFF provides a uniform approach to feature fusion across various network scenarios, including same-layer fusions, short skip connections, and long skip connections. This method integrates multi-scale channel attention to dynamically and adaptively fuse features.
Iterative Attentional Feature Fusion (iAFF): Recognizing that initial feature integration could be a bottleneck, the authors introduce iterative attentional feature fusion, which refines the fusion process by adding another layer of attention. This iterative approach shows further improvements in performance.
Multi-Scale Channel Attention Module (MS-CAM): The MS-CAM addresses scale inconsistency by aggregating local and global contexts within the attention mechanism, which enhances the model's ability to detect objects of varying scales.
Empirical Validation: The proposed methods outperform state-of-the-art networks, demonstrating their effectiveness with fewer parameters on CIFAR-100 and ImageNet datasets.

Analysis of Numerical Results

The experimental results highlight that models employing AFF and iAFF outperform baseline models across different tasks and datasets. On CIFAR-100, a notable improvement is observed in accuracy with fewer layers, indicating efficient feature delivery and network compactness. On ImageNet, iAFF-ResNet-50 surpasses models like Gather-Excite-ResNet-101 with 40% fewer parameters, emphasizing the potential for a more economical yet effective design.

Implications and Future Directions

The implications of this research stretch across both theoretical and practical dimensions:

Theoretical Impact: The introduction of a unified framework that can handle diverse fusion scenarios adds a new perspective to the design of neural architectures. The adoption of multi-scale attention in defining feature contexts may inspire further explorations into adaptive mechanisms in deep learning.
Practical Application: The ability to achieve state-of-the-art performance with reduced parameters offers practical benefits in deploying models in resource-constrained environments. This efficiency could see integration in mobile AI applications, where computational resources are limited.

Looking towards future developments, it is anticipated that incorporating similar attention-based methodologies could refine existing neural network architectures. The applicability of these ideas might extend into other domains of AI beyond computer vision, like natural language processing and speech recognition, where feature fusion equally plays a pivotal role.

Conclusion

"Attentional Feature Fusion" significantly advances the discourse in feature fusion strategies within neural networks. The novel methodologies proposed provide fresh insights into achieving more efficient and effective neural architectures. This work lays a promising groundwork for further innovations in adaptive and dynamic model designs.

PDF Markdown