Attention-Aware Compositional Network for Person Re-identification (1805.03344v2)

Published 9 May 2018 in cs.CV

Abstract: Person re-identification (ReID) is to identify pedestrians observed from different camera views based on visual appearance. It is a challenging task due to large pose variations, complex background clutters and severe occlusions. Recently, human pose estimation by predicting joint locations was largely improved in accuracy. It is reasonable to use pose estimation results for handling pose variations and background clutters, and such attempts have obtained great improvement in ReID performance. However, we argue that the pose information was not well utilized and hasn't yet been fully exploited for person ReID. In this work, we introduce a novel framework called Attention-Aware Compositional Network (AACN) for person ReID. AACN consists of two main components: Pose-guided Part Attention (PPA) and Attention-aware Feature Composition (AFC). PPA is learned and applied to mask out undesirable background features in pedestrian feature maps. Furthermore, pose-guided visibility scores are estimated for body parts to deal with part occlusion in the proposed AFC module. Extensive experiments with ablation analysis show the effectiveness of our method, and state-of-the-art results are achieved on several public datasets, including Market-1501, CUHK03, CUHK01, SenseReID, CUHK03-NP and DukeMTMC-reID.

Authors (5)

Jing Xu (244 papers)
Rui Zhao (241 papers)
Feng Zhu (139 papers)
Huaming Wang (23 papers)
Wanli Ouyang (358 papers)

Citations (435)

View on Semantic Scholar

Summary

Attention-Aware Compositional Network for Person Re-identification

The paper "Attention-Aware Compositional Network for Person Re-identification" presents a novel framework that addresses the crucial challenges in person re-identification (ReID) using a sophisticated method that leverages human pose estimation to improve the accuracy and robustness of pedestrian identification. This research focuses on resolving issues concerning pose variation, background clutter, and occlusion, which significantly impact the performance of ReID systems.

Contribution and Framework

The proposed framework, known as the Attention-Aware Compositional Network (AACN), comprises two primary components: Pose-guided Part Attention (PPA) and Attention-aware Feature Composition (AFC). The introduction of these components represents a methodological advancement aimed at enhancing the precision and discriminability of extracted features for ReID tasks.

Pose-guided Part Attention (PPA): The PPA module in the AACN is designed to improve part attention estimation by integrating pose estimation results, which guide the learning of part-specific attention maps. By doing so, PPA effectively filters out background clutter and prevents noise from adjacent regions, thereby enhancing the accuracy of feature localization compared to conventional rectangular Region of Interest (RoI) approaches.
Attention-aware Feature Composition (AFC): This component employs the part-specific attentions generated by PPA to produce aligned feature representations. The AFC module re-weights these features using visibility scores that estimate the extent of occlusion for each body part. Such a mechanism allows for adaptive feature processing that considers each part's salience and visibility, thus robustly handling challenging instances of occlusion.

Experiments and Results

Empirical validation on several benchmark datasets, including Market-1501, CUHK03, CUHK01, SenseReID, CUHK03-NP, and DukeMTMC-reID, demonstrates the superior performance of AACN. The framework consistently achieves state-of-the-art results, underscoring its effectiveness across different datasets and scenarios. Notably, AACN achieves significant improvements in rank-1 accuracy and mean Average Precision (mAP), reflecting its efficacy in handling intrinsic challenges of ReID.

Implications and Future Directions

This research contributes to the ongoing development of ReID methodologies by offering a novel approach that combines pose estimation with attention mechanisms, advancing the feature extraction and alignment process. The practical implications are considerable, particularly in surveillance systems where accurate, real-time pedestrian identification is necessary. Theoretically, this framework suggests a promising direction for integrating pose information more effectively in visual recognition tasks.

Future work can explore extending the AACN framework to generalized object recognition tasks or further optimizing computational efficiency, thereby broadening its applicability. Additionally, investigating adaptive learning approaches that account for system resource constraints could enhance the deployment of AACN in resource-limited environments. Enhanced modeling of human attributes, coupled with the current attention and composition strategies, may offer new avenues for improving the resolution of ReID problems.

PDF Markdown

Related Papers

Find Related Papers