Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation (1909.09675v1)

Published 20 Sep 2019 in cs.CV

Abstract: Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras. To address this challenging task, existing re-ID models typically rely on a large amount of labeled training data, which is not practical for real-world applications. To alleviate this limitation, researchers now targets at cross-dataset re-ID which focuses on generalizing the discriminative ability to the unlabeled target domain when given a labeled source domain dataset. To achieve this goal, our proposed Pose Disentanglement and Adaptation Network (PDA-Net) aims at learning deep image representation with pose and domain information properly disentangled. With the learned cross-domain pose invariant feature space, our proposed PDA-Net is able to perform pose disentanglement across domains without supervision in identities, and the resulting features can be applied to cross-dataset re-ID. Both of our qualitative and quantitative results on two benchmark datasets confirm the effectiveness of our approach and its superiority over the state-of-the-art cross-dataset Re-ID approaches.

Citations (180)

View on Semantic Scholar

Summary

The paper introduces PDA-Net, a framework that separates pose and domain features to enable unsupervised cross-dataset person re-identification.
It employs adversarial learning and Maximum Mean Discrepancy loss to align feature distributions for robust re-ID across varying datasets.
Experimental results on Market-1501 and DukeMTMC-reID demonstrate significant improvements in rank-1 accuracy and mean Average Precision.

Overview of Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation

The paper presents a novel framework for addressing the cross-dataset person re-identification (re-ID) problem, which is crucial in surveillance and smart city applications. The principal challenge addressed is the lack of labeled data in the target domain, which is typically unsupervised. The framework introduced, Pose Disentanglement and Adaptation Network (PDA-Net), proposes a deep learning model that separates the pose and domain-specific features and adapts them across datasets without the necessity of identity supervision in the target domain.

Technical Approach

The PDA-Net model is designed to achieve domain-invariant yet pose-disentangled feature learning, which facilitates re-ID across different datasets. Key to this framework is:

Pose Disentanglement: The model separates pose information from the identity or style-related features, allowing the extraction and adaptation of pose-independent features. This disentanglement is achieved through the use of encoders and adversarial learning with a shared pose discriminator, enabling pose-guided image synthesis and translation without requiring pre-defined pose categories.
Domain Adaptation: The PDA-Net extracts domain-invariant features that enable it to perform identity matching without being affected by the dataset-specific styles or biases. This is accomplished through a deep neural network incorporating a Maximum Mean Discrepancy (MMD) loss to ensure feature distributions are aligned between source and target datasets.
Generators and Discriminators: The framework utilizes domain-specific generators and domain-specific discriminators to reconstruct and synthesize images. This joint trainability allows for effective pose-guided single-domain and cross-domain image generation, maintained by adversarial loss to ensure realistic outputs.

Experimental Evaluation

The paper employs the Market-1501 and DukeMTMC-reID datasets to benchmark the PDA-Net's performance. The experimental results demonstrate significant improvements over state-of-the-art methods, indicating the model's superior ability to adapt to unsupervised cross-dataset environments. Specifically, PDA-Net achieves high rank-1 accuracy and mAP (mean Average Precision) scores, substantiating its capability to generalize well in target domains with no labeled data.

Discussion on Implications and Future Directions

The implications of this research are substantial for practical applications in environments where collecting labeled data is infeasible. The unsupervised learning strategy employed by PDA-Net aligns with real-world scenarios, making it highly applicable to evolving surveillance systems wanting minimal manual intervention.

Theoretical implications also suggest a potential expansion into multi-modality re-ID tasks, where attributes beyond visual domains could be incorporated. Future developments may include enhancing the model's robustness against diverse poses and appearances, along with potential applications into other cross-domain recognition tasks beyond person re-ID.

In closing, the paper presents a compelling advancement in unsupervised learning within person re-ID, addressing both pose and domain challenges effectively. As the field progresses, such frameworks will likely contribute significantly to achieving more autonomous and generalized AI vision systems.

PDF Markdown