Relation-Aware Global Attention for Person Re-identification (1904.02998v2)

Published 5 Apr 2019 in cs.CV

Abstract: For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i.e., discriminative feature learning. Previous approaches typically learn attention using local convolutions, ignoring the mining of knowledge from global structure patterns. Intuitively, the affinities among spatial positions/nodes in the feature map provide clustering-like information and are helpful for inferring semantics and thus attention, especially for person images where the feasible human poses are constrained. In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. Specifically, for each feature position, in order to compactly grasp the structural information of global scope and local appearance information, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions (e.g., in raster scan order), and the feature itself together to learn the attention with a shallow convolutional model. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power and help achieve the state-of-the-art performance on several popular benchmarks. The source code is available at https://github.com/microsoft/Relation-Aware-Global-Attention-Networks.

Citations (449)

View on Semantic Scholar

Summary

The paper introduces the RGA module that leverages pairwise correlations across the feature map for superior global structural representation.
It combines spatial and channel attention mechanisms to deliver robust and discriminative feature learning beyond traditional local attention methods.
Experimental validations on benchmarks like CUHK03 and Market1501 demonstrate state-of-the-art improvements in person re-identification accuracy.

Relation-Aware Global Attention for Person Re-identification

The paper "Relation-Aware Global Attention for Person Re-identification" introduces an advanced attention mechanism aimed at enhancing person re-identification (re-id) tasks. The authors develop the Relation-Aware Global Attention (RGA) module, explicitly designed to capture global structural information, aiming to improve discriminative feature learning—central to effective person re-id.

Core Contributions

The RGA module is innovatively structured to compute attention by analyzing relations across the entire feature map. For each feature position, the module stacks its pairwise correlations with all positions, capturing both global structure and local information using a shallow convolutional model. The approach enhances the representation power of features, achieving state-of-the-art results across several benchmarks, including CUHK03, Market1501, and MSMT17 datasets.

Methodological Insights

Relation-Aware Global Attention (RGA): The RGA mechanism accounts for global context by embedding spatial and channel dimension relations within neural networks, diverging from traditional local attention strategies limited by convolutional receptive fields. The module integrates the relation vector and the original feature, evaluating attention through a dual-layered convolutional structure.
Spatial and Channel Attentions (RGA-S and RGA-C): Separate modules are implemented for spatial and channel-wise relations. The spatial RGA-S exploits dot-product affinity to evaluate cross-node relationships spatially, whereas RGA-C addresses channel-wise dynamics through analogous methods focusing on feature map channels.
Integration Strategy: The paper presents integration schemes where spatial and channel attentions are sequentially or simultaneously combined, notably in the RGA-SC variant, delivering robust improvements.

Experimental Validation

Through comprehensive ablation studies, the authors validate each component's contribution. For instance, the use of only global relations, devoid of original features, still markedly surpasses baseline models, underscoring the importance of relationships in attention derivation. The RGA modules consistently outperform traditional attention methods, such as CBAM and non-local networks, by leveraging global scope structural information effectively.

Theoretical and Practical Implications

The proposed methodology reflects significant theoretical advancements by leveraging pairwise relations to understand and infer global attention patterns. Practically, the enhanced focus on discriminative regions within images translates to superior performance in re-id tasks. The source code availability promotes further exploration and integration into existing frameworks.

Future Prospects

The novel approach in mining global relations suggests potential extensions in other computer vision applications where global context plays a crucial role. Future work may explore the application beyond person re-id, enhancing attention mechanisms across diverse domains such as object detection and video analysis.

In summary, the introduction of relation-aware global attention marks a substantial step forward in person re-identification techniques, presenting both theoretical innovations and practical performance improvements.

PDF Markdown

Related Papers

GitHub

GitHub - microsoft/Relation-Aware-Global-Attention-Networks: We design an effective Relation-Aware Global Attention (RGA) module for CNNs to globally infer the attention. (337 stars)