Pose-Normalized Image Generation for Person Re-identification (1712.02225v6)

Published 6 Dec 2017 in cs.CV, cs.AI, cs.MM, and stat.ML

Abstract: Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations. In this work, we address both problems by proposing a novel deep person image generation model for synthesizing realistic person images conditional on the pose. The model is based on a generative adversarial network (GAN) designed specifically for pose normalization in re-id, thus termed pose-normalization GAN (PN-GAN). With the synthesized images, we can learn a new type of deep re-id feature free of the influence of pose variations. We show that this feature is strong on its own and complementary to features learned with the original images. Importantly, under the transfer learning setting, we show that our model generalizes well to any new re-id dataset without the need for collecting any training data for model fine-tuning. The model thus has the potential to make re-id model truly scalable.

Authors (8)

Xuelin Qian (31 papers)
Yanwei Fu (199 papers)
Tao Xiang (324 papers)
Wenxuan Wang (128 papers)
Jie Qiu (19 papers)
Yang Wu (175 papers)
Yu-Gang Jiang (223 papers)
Xiangyang Xue (169 papers)

Citations (429)

View on Semantic Scholar

Summary

Insights into Pose-Normalized Image Generation for Person Re-identification

This paper introduces a novel approach to person re-identification (re-id) by utilizing pose-normalized image generation. The proposed method addresses key challenges in re-id, notably the lack of cross-view paired training data and the variability in pose, which impacts the ability to learn discriminative, identity-sensitive, and view-invariant features. The approach leverages a generative adversarial network (GAN), termed pose-normalization GAN (PN-GAN), to synthesize person images conditioned on pose, thereby neutralizing pose variations and enhancing the learning process for re-id features.

Key Methodological Contributions

Pose-Normalization GAN (PN-GAN): The PN-GAN is designed to generate realistic and identity-preserving images free of pose-induced appearance changes. By creating images of individuals in canonical poses, the model enriches the training dataset and facilitates the extraction of re-id features that are less sensitive to pose variations.
Enhanced Feature Learning: The method yields new deep re-id features derived from synthesized images. These pose-normalized features complement those learned from the original images, resulting in a comprehensive feature set that better captures identity-specific patterns across different views.
Scalability and Generalizability: Importantly, the paper demonstrates that the PN-GAN can be applied effectively to new re-id datasets without the burdensome task of collecting additional labeled data for fine-tuning. This characteristic is crucial for deploying re-id systems across large-scale camera networks where manual data annotation is infeasible.

Experimental Validation

The effectiveness of the approach is validated through extensive experiments on several re-id benchmarks, including Market-1501, CUHK03, DukeMTMC-reID, and CUHK01 datasets. The results consistently indicate that incorporating pose-normalized images in the training process enhances the re-id performance, with notable improvements observed in both Rank-1 accuracy and mean Average Precision (mAP).

Comparison with Baselines: The paper highlights improvements over even strong baseline models. For instance, in Market-1501, the proposed method achieves an mAP of 72.58%, surpassing various existing re-id models that do not employ pose normalization.
Supervised vs. Transfer Learning: The proposed approach is tested under both supervised learning (SL) and transfer learning (TL) settings. In TL settings, where the model is pre-trained on Market-1501 and tested directly on other datasets, the approach shows competitive performance, underscoring its potential for real-world scalability and adaptability.
Impact of Canonical Poses: Experiments with different numbers of canonical poses reveal that using multiple canonical poses (eight in this paper) contributes significantly to the robustness and accuracy of feature extraction.

Practical and Theoretical Implications

The findings from this research have several implications:

Practical Deployment: The method alleviates the dependency on large, labeled re-id datasets, making it viable for use in expansive networks with numerous camera views, such as airports or public transport stations.
Theoretical Advances: By disentangling pose from identity in image generation, the paper contributes to a deeper understanding of multi-factor covariance in feature learning. The synthesis of realistic images that maintain identity attributes while adjusting pose illustrates a promising direction for conditional image generation tasks in computer vision.

Future Directions

The framework established by this paper opens up several avenues for future research, including:

Exploration of Alternative Deep Models: While ResNet-50 serves as a strong baseline, experimentation with other architectures could further optimize the balance of computational cost and accuracy.
Expansion to Other Domains: The underlying principles of pose normalization may prove beneficial in related domains, such as activity recognition or face re-id, where pose variations similarly complicate feature extraction.

In conclusion, this paper contributes a methodologically sound and practically significant approach to improving person re-identification through pose-normalized image generation, substantiating its claims with rigorous experimentation and setting the stage for future advancements in scalable re-id technologies.

PDF Markdown

Related Papers

Find Related Papers