Pose-driven Deep Convolutional Model for Person Re-identification (1709.08325v1)

Published 25 Sep 2017 in cs.CV

Abstract: Feature extraction and matching are two crucial components in person Re-Identification (ReID). The large pose deformations and the complex view variations exhibited by the captured person images significantly increase the difficulty of learning and matching of the features from person images. To overcome these difficulties, in this work we propose a Pose-driven Deep Convolutional (PDC) model to learn improved feature extraction and matching models from end to end. Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts. To match the features from global human body and local body parts, a pose driven feature weighting sub-network is further designed to learn adaptive feature fusions. Extensive experimental analyses and results on three popular datasets demonstrate significant performance improvements of our model over all published state-of-the-art methods.

PDF Abstract

Pose-driven Deep Convolutional Model for Person Re-identification

The paper "Pose-driven Deep Convolutional Model for Person Re-Identification" addresses the significant challenges in person re-identification (ReID) that arise due to large variations in pose and viewpoint changes in images captured by surveillance systems. These variations complicate the task of robust feature extraction and matching, which are critical for accurately identifying individuals across multiple camera views.

The authors propose a Pose-driven Deep Convolutional (PDC) model designed to mitigate these issues by incorporating human pose information into the deep learning framework. The PDC model is notable for its use of two novel sub-networks: the Feature Embedding sub-Net (FEN) and the Feature Weighting sub-Net (FWN).

Key Innovations

The PDC model introduces several key innovations:

Feature Embedding sub-Net (FEN):
- This subnet utilizes human pose cues to transform body parts into normalized regions.
- It leverages a Pose Transformer Network (PTN) to perform affine transformations, which align the body parts into consistent, easily recognizable formats despite variations in pose.
- The FEN extracts six parts from the human body: head, upper body, left arm, right arm, left leg, and right leg, normalizing these parts using cropping, resizing, and rotation to mitigate the impact of pose variations.
Feature Weighting sub-Net (FWN):
- The FWN dynamically learns the weights for different body parts to optimize feature representation and similarity measurement.
- This sub-network aims to emphasize more discriminative parts while down-weighting less informative or noisy parts, enhancing the overall robustness of the feature matching process.

Experimental Results

The proposed PDC model is evaluated on three widely used person ReID datasets: CUHK03, Market1501, and VIPeR. The experimental setup also includes various benchmarks and comparisons with state-of-the-art methods:

CUHK03 Dataset:
- The PDC model achieves a rank-1 accuracy of 78.29% and 88.70% on the detected and labeled datasets, respectively. This performance surpasses that of other methods like PIE [62] and Spindle [16].
Market1501 Dataset:
- The model exhibits significant improvements with a mean Average Precision (mAP) of 63.41% and a rank-1 accuracy of 84.14%. These metrics highlight the model's effectiveness in handling large-scale datasets with numerous individuals and complex viewpoints.
VIPeR Dataset:
- The PDC model obtains a rank-1 accuracy of 51.27%, demonstrating competitive performance against other methods, particularly those considering pose information.

Implications and Future Directions

The implications of these findings are substantial for both practical and theoretical aspects of person ReID. Practically, the ability to incorporate pose information into deep learning frameworks can markedly improve the accuracy and robustness of surveillance systems. Theoretically, the paper underscores the importance of pose normalization and adaptive feature weighting, suggesting new avenues for research in feature learning and similarity measurement.

Looking forward, the integration of more sophisticated pose estimation techniques and the exploration of additional feature weighting mechanisms could further enhance the model's performance. Additionally, extending this approach to other related domains, such as action recognition or multi-object tracking, could provide broader applications for the foundational concepts introduced in this paper.

In conclusion, the Pose-driven Deep Convolutional Model represents a notable advance in the field of person re-identification, providing a robust solution to the challenge of pose variation through innovative use of pose-driven feature embedding and adaptive feature weighting. The experimental results support the effectiveness of the proposed approach, setting the stage for further research and development in this area.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Chi Su (16 papers)
Jianing Li (65 papers)
Shiliang Zhang (132 papers)
Junliang Xing (80 papers)
Wen Gao (114 papers)
Qi Tian (314 papers)

Citations (785)

View on Semantic Scholar

Pose-driven Deep Convolutional Model for Person Re-identification (1709.08325v1)