AANet: Attribute Attention Network for Person Re-Identifications (1912.09021v1)

Published 19 Dec 2019 in cs.CV

Abstract: This paper proposes Attribute Attention Network (AANet), a new architecture that integrates person attributes and attribute attention maps into a classification framework to solve the person re-identification (re-ID) problem. Many person re-ID models typically employ semantic cues such as body parts or human pose to improve the re-ID performance. Attribute information, however, is often not utilized. The proposed AANet leverages on a baseline model that uses body parts and integrates the key attribute information in an unified learning framework. The AANet consists of a global person ID task, a part detection task and a crucial attribute detection task. By estimating the class responses of individual attributes and combining them to form the attribute attention map (AAM), a very strong discriminatory representation is constructed. The proposed AANet outperforms the best state-of-the-art method arXiv:1711.09349v3 [cs.CV] using ResNet-50 by 3.36% in mAP and 3.12% in Rank-1 accuracy on DukeMTMC-reID dataset. On Market1501 dataset, AANet achieves 92.38% mAP and 95.10% Rank-1 accuracy with re-ranking, outperforming arXiv:1804.00216v1 [cs.CV], another state of the art method using ResNet-152, by 1.42% in mAP and 0.47% in Rank-1 accuracy. In addition, AANet can perform person attribute prediction (e.g., gender, hair length, clothing length etc.), and localize the attributes in the query image.

Authors (3)

Chiat-Pin Tay (1 paper)
Sharmili Roy (1 paper)
Kim-Hui Yap (28 papers)

Citations (290)

View on Semantic Scholar

Summary

Overview of AANet: Attribute Attention Network for Person Re-Identification

The paper presents a detailed proposal and evaluation of a novel architecture named Attribute Attention Network (AANet), devised to enhance the accuracy of person re-identification (re-ID) systems. Traditional re-ID models largely rely on body parts and human pose estimates to optimize performance. AANet innovatively introduces the utilization of person attributes such as clothing details and hair in creating a more refined model for person re-identification tasks.

Technical Contributions and Architecture

AANet is built upon a backbone network similar to ResNet-50 and integrates three distinct sub-networks to form a robust unified learning framework: the Global Feature Network (GFN), Part Feature Network (PFN), and Attribute Feature Network (AFN). Each sub-network focuses on specific aspects of input data to contribute to identity classification:

Global Feature Network (GFN): Primarily tasked with image-level identity classification, delivering a holistic representation of the queried image.
Part Feature Network (PFN): This sub-network refines the identity recognition process by detecting specific body parts, reducing the impact of background noise and occlusions.
Attribute Feature Network (AFN): Central to AANet's innovation, the AFN incorporates attribute classification and generates an Attribute Attention Map (AAM). Using Class Activation Maps (CAM), AFN localizes and highlights attribute-specific regions, contributing significantly to identity discrimination.

Employment of homoscedastic uncertainty learning ensures the model can optimize multiple task-specific loss functions effectively, fine-tuning the contribution of each sub-network.

Experimental Results

AANet was rigorously evaluated against existing state-of-the-art re-ID models on the DukeMTMC-reID and Market1501 datasets. The architecture demonstrated superior performance, achieving a notable increase over existing techniques. Specifically, AANet outperformed benchmark methods by a margin of 3.36% in mAP and 3.12% in Rank-1 accuracy on the DukeMTMC-reID dataset. A key technological advantage offered by AANet is its ability to perform attribute prediction and visualization, enriching the system's interpretability in complex, real-world scenarios.

Theoretical and Practical Implications

From a theoretical perspective, AANet underscores the importance of incorporating detailed attribute attention in complex image recognition tasks. The integration of attribute features such as clothing color, hair, and accessories presents a more nuanced approach to identity recognition that substantially reduces error rates in commonly challenging image conditions — notably occlusions, cluttered backgrounds, and diverse poses.

Practically, the improvements in re-ID systems are significant for security and surveillance industries, where accurate identification of individuals across non-overlapping camera systems is essential. AANet, with its attribute prediction capabilities, also opens avenues for more adaptable re-ID systems that can handle dynamic environments and inconsistent imaging conditions.

Future Directions

Future research could focus on expanding the attribute attention framework to include other modalities of context such as temporal and spatial dynamics in video data, enhancing AANet’s robustness even further. Another potential direction is the exploration of more complex backbone networks to see if they can leverage this refined attribute integration for additional gains in accuracy.

In conclusion, AANet represents a substantial step forward for person re-ID methodologies, effectively integrating attribute attention into a classification model to significantly enhance performance metrics and practical applicability. This research opens promising avenues not only for re-ID systems but potentially for broader applications within the field of computer vision.

PDF Markdown

Related Papers

Find Related Papers