Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification

Published 22 Oct 2019 in cs.CV | (1910.10111v1)

Abstract: Person re-identification is a challenging task due to various complex factors. Recent studies have attempted to integrate human parsing results or externally defined attributes to help capture human parts or important object regions. On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes. In this paper, we address the missed contextual cues by exploiting both the accurate human parts and the coarse non-human parts. In our implementation, we apply a human parsing model to extract the binary human part masks \emph{and} a self-attention mechanism to capture the soft latent (non-human) part masks. We verify the effectiveness of our approach with new state-of-the-art performances on three challenging benchmarks: Market-1501, DukeMTMC-reID and CUHK03. Our implementation is available at https://github.com/ggjy/P2Net.pytorch.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (174)

View on Semantic Scholar

Summary

Analyzing Dual Part-Aligned Representations for Person Re-Identification

The paper titled "Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification" introduces a novel method aimed at enhancing the accuracy of person re-identification (Re-ID), which is a critical challenge in video surveillance and computer vision applications. The authors propose a dual part-aligned scheme, which effectively leverages both accurate human part information and contextual cues from non-human elements to resolve misalignment issues commonly faced in the task of Re-ID.

Contribution and Methodology

The dual part-aligned representation comprises two main branches: the human part branch and the latent part branch. The human part branch employs a human parsing model such as CE2P to generate masks for predefined human parts. By aligning features to these masks, the model can extract precise human part representations, improving the robustness against background noise.

Complementarily, the latent part branch focuses on exploiting contextual information beyond predefined human parts using a self-attention mechanism. This mechanism enables the model to learn latent part representations based on appearance similarities among pixels, capturing both fine-grained human and non-human cues which are often overlooked by conventional models.

The combination of human and latent part-aligned representations is implemented in what the authors refer to as a Dual Part-aligned Block (DPB). By adding DPBs within a ResNet-50 framework, the model is fortified to handle both well-aligned human features and more difficul-to-capture non-human contextual signals.

Empirical Results

The empirical evaluations confirm the efficacy of the proposed dual part-aligned representation approach. The authors report state-of-the-art performances on three challenging benchmarks: Market-1501, DukeMTMC-reID, and CUHK03. Specifically, the proposed approach consistently outperforms baseline models, achieving notable increases in both Rank-1 accuracy and mean average precision (mAP) across datasets. For instance, on the Market-1501 dataset, the model achieves a Rank-1 accuracy of 95.2% and an mAP of 85.6%, representing significant improvements over previous methods like PCB.

Implications and Future Work

The dual part-aligned approach opens new avenues for improving person re-identification systems by addressing the challenge of misalignment and occlusion in a more comprehensive manner. By leveraging both human-centric and contextual cues, this method highlights the importance of holistic data representation in complex visual tasks like person re-ID.

The paper suggests several potential directions for future work, including refining the latent part branch to better differentiate between useful non-human contextual cues and noise. Another avenue is the exploration of how dual part-aligned representations could be adapted into other computer vision tasks, potentially extending the applicability of the approach beyond person re-identification.

In conclusion, this research provides substantial advancements in the person re-identification field by efficiently integrating human part information with robust contextual signal processing. It sets a solid foundation for future explorations aiming to tackle significant challenges inherent in visual data processing and interpretation.