Pedestrian Alignment Network for Large-scale Person Re-identification (1707.00408v1)

Published 3 Jul 2017 in cs.CV

Abstract: Person re-identification (person re-ID) is mostly viewed as an image retrieval problem. This task aims to search a query person in a large image pool. In practice, person re-ID usually adopts automatic detectors to obtain cropped pedestrian images. However, this process suffers from two types of detector errors: excessive background and part missing. Both errors deteriorate the quality of pedestrian alignment and may compromise pedestrian matching due to the position and scale variances. To address the misalignment problem, we propose that alignment can be learned from an identification procedure. We introduce the pedestrian alignment network (PAN) which allows discriminative embedding learning and pedestrian alignment without extra annotations. Our key observation is that when the convolutional neural network (CNN) learns to discriminate between different identities, the learned feature maps usually exhibit strong activations on the human body rather than the background. The proposed network thus takes advantage of this attention mechanism to adaptively locate and align pedestrians within a bounding box. Visual examples show that pedestrians are better aligned with PAN. Experiments on three large-scale re-ID datasets confirm that PAN improves the discriminative ability of the feature embeddings and yields competitive accuracy with the state-of-the-art methods.

Authors (3)

Zhedong Zheng (67 papers)
Liang Zheng (181 papers)
Yi Yang (856 papers)

Citations (472)

View on Semantic Scholar

Summary

An Overview of the Pedestrian Alignment Network for Large-scale Person Re-identification

The paper "Pedestrian Alignment Network for Large-scale Person Re-identification" presents an innovative approach to tackling the challenge of person re-identification (re-ID) by addressing alignment issues inherent in pedestrian detection processes. The authors introduce a Pedestrian Alignment Network (PAN), which integrates the task of pedestrian alignment with that of learning discriminative embeddings for re-identification without the need for additional annotations beyond identity labels.

Core Contributions

The research primarily offers the following contributions:

Integration of Alignment and Identification: The proposed PAN is capable of learning both the spatial alignment of pedestrians within detected images and the identification tasks concurrently. This dual approach capitalizes on the attention mechanism inherent in convolutional neural networks (CNNs), which naturally focus on human body areas.
Automatic Correction of Misalignment: The network addresses misalignment caused by detection errors, like excessive background inclusion and missing body parts. By leveraging CNN-based attention, the PAN adaptively performs alignment, resulting in better-aligned pedestrian representations and improved person re-ID accuracy.
Competitive Performance: Experimental results on large-scale re-ID datasets such as Market-1501, CUHK03, and DukeMTMC-reID showcase PAN's improvement over existing baseline methods. The PAN achieves competitive rank-1 accuracy and mean Average Precision (mAP), verifying its efficacy across varied benchmarks.

Technical Details

The PAN architecture consists of two main convolutional branches for base and alignment tasks, coupled with an affine estimation branch. These components work synergistically:

Base and Alignment Branches: These branches perform identification tasks by predicting class probabilities for pedestrians. They distinguish between identities while learning spatial localization cues.
Affine Estimation: This branch employs the Spatial Transformer Networks (STN) to predict affine transformation parameters used for re-localizing pedestrians. The aim is to mitigate variance in scale and positioning caused by detection inaccuracies.
Pedestrian Descriptor Fusion: By integrating features from both branches, the paper demonstrates the complementary nature of the aligned and original inputs, thus enhancing the discriminative power of the resulting pedestrian descriptors.

Results and Observations

The paper reports improvements in re-ID benchmarks due to PAN:

On the Market-1501 dataset, PAN achieves a rank-1 accuracy of 82.81% and an mAP of 63.35%.
Similar enhancements are observed on CUHK03 and DukeMTMC-reID datasets, validating the cross-dataset generalizability of the proposed approach.

Moreover, the research identifies scope for future exploration in reducing computational overhead and adapting the model to other domains such as vehicle recognition.

Implications and Future Directions

The PAN innovatively couples pedestrian alignment with identity recognition tasks in a manner that doesn't require additional annotations. This dual-task approach proves efficient in improving re-ID performance by addressing misalignment issues.

For theoretical implications, the paper provides insights into how attention mechanisms can be leveraged for spatial alignment in deep learning models. Practically, it promises enhanced accuracy in surveillance systems.

Future research could explore:

Optimizing network complexity for faster execution.
Generalizing PAN's framework to other recognition tasks beyond pedestrians, extending its applicability to a broader spectrum of AI applications.

In conclusion, the integration of alignment and identification provides a robust pathway for improving person re-identification systems, with PAN standing as a notable advancement in this domain.

PDF Markdown

Related Papers

Find Related Papers