GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (1904.11492v1)

Published 25 Apr 2019 in cs.CV, cs.AI, and cs.LG

Abstract: The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by non-local network are almost the same for different query positions within an image. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further observe that this simplified design shares similar structure with Squeeze-Excitation Network (SENet). Hence we unify them into a three-step general framework for global context modeling. Within the general framework, we design a better instantiation, called the global context (GC) block, which is lightweight and can effectively model the global context. The lightweight property allows us to apply it for multiple layers in a backbone network to construct a global context network (GCNet), which generally outperforms both simplified NLNet and SENet on major benchmarks for various recognition tasks. The code and configurations are released at https://github.com/xvjiarui/GCNet.

Citations (1,430)

View on Semantic Scholar

Summary

The paper proposes a unified GC block that integrates non-local and squeeze-excitation mechanisms to capture long-range dependencies efficiently.
It introduces a Simplified Non-Local block that reduces computational overhead while maintaining performance parity with traditional non-local networks.
Experimental results on COCO, ImageNet, and Kinetics benchmarks highlight significant performance gains and the framework’s versatility.

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

The paper "GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond" presents a comprehensive paper on enhancing the efficiency and effectiveness of neural network architectures for capturing long-range dependencies in images and videos. The authors propose the Global Context Network (GCNet), which integrates the strengths of Non-Local Networks (NLNet) and Squeeze-Excitation Networks (SENet) into a unified framework for global context modeling.

Key Observations and Motivations

The NLNet has been instrumental in modeling long-range dependencies via query-specific global context aggregation. However, empirical analysis shows that the global contexts modeled are remarkably similar across different query positions within an image. This redundancy suggests that a more efficient, query-independent approach to global context modeling could be feasible without sacrificing performance. This observation forms the crux of the work, prompting the development of a simplified network that can offer the same accuracy with significantly reduced computational overhead.

Simplified Non-Local Block

The paper introduces a Simplified Non-Local (SNL) block, which models a single global attention map shared across all query positions. This approach dramatically reduces computational complexity while maintaining performance parity with the original NLNet. By comparing different blocks, it is shown that the performance of SNL blocks is on par with the NLNet but requires considerably fewer resources.

Unification and General Framework

Critical to the development in this paper is the realization that both the SNL block and the SENet block can be unified under a three-step framework for global context modeling:

Context Modeling: Aggregating features from all positions to form a global context.
Feature Transform: Capturing channel-wise interdependencies.
Fusion: Merging global context features with local features.

The Global Context Block

Building upon this framework, the authors propose the Global Context (GC) block, which employs global attention pooling for context modeling and a lightweight bottleneck transform for feature processing. The GC block uses addition for feature fusion, effectively improving on both the SNL and SENet designs. By incorporating layer normalization within the bottleneck transform, the GC block achieves better performance by alleviating optimization difficulties.

Experimental Validation

The paper validates the GCNet across multiple major benchmarks:

COCO Object Detection and Segmentation: The GCNet outperforms both NLNet and SENet in terms of AP (Average Precision) metrics with a negligible increase in FLOPs.
ImageNet Classification: The GC blocks, when integrated into ResNet-50, yield a significant improvement in top-1 and top-5 accuracy metrics.
Kinetics Action Recognition: Applying GC blocks to Slow-only networks shows notable gains in top-1 and top-5 accuracy, underscoring the effectiveness of the GC block in video tasks.

The detailed ablation studies and comparisons of pooling and fusion strategies underline the robustness of the proposed method, reinforcing the GC block's utility in practical deep learning deployments.

Implications and Future Developments

The implications of this research are twofold. Practically, it provides a straightforward method to enhance existing architectures with minimal computational overhead, potentially impacting a range of applications from object detection to action recognition. Theoretically, the paper bridges the conceptual gap between non-local and squeeze-excitation mechanisms, offering a more unified perspective on global context modeling.

Future developments might explore further optimizations of the bottleneck transform or extend the GC block's principles to other domains and tasks. Additionally, integrating this framework with evolving architectures or in conjunction with emerging training paradigms could yield even more substantial improvements.

In conclusion, the GCNet advances the field of neural network design by offering an efficient, scalable solution for global context modeling, achieving superior performance on diverse visual recognition tasks.