Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer (2112.02466v2)

Published 5 Dec 2021 in cs.CV

Abstract: Occluded person re-identification is a challenging task as human body parts could be occluded by some obstacles (e.g. trees, cars, and pedestrians) in certain scenes. Some existing pose-guided methods solve this problem by aligning body parts according to graph matching, but these graph-based methods are not intuitive and complicated. Therefore, we propose a transformer-based Pose-guided Feature Disentangling (PFD) method by utilizing pose information to clearly disentangle semantic components (e.g. human body or joint parts) and selectively match non-occluded parts correspondingly. First, Vision Transformer (ViT) is used to extract the patch features with its strong capability. Second, to preliminarily disentangle the pose information from patch information, the matching and distributing mechanism is leveraged in Pose-guided Feature Aggregation (PFA) module. Third, a set of learnable semantic views are introduced in transformer decoder to implicitly enhance the disentangled body part features. However, those semantic views are not guaranteed to be related to the body without additional supervision. Therefore, Pose-View Matching (PVM) module is proposed to explicitly match visible body parts and automatically separate occlusion features. Fourth, to better prevent the interference of occlusions, we design a Pose-guided Push Loss to emphasize the features of visible body parts. Extensive experiments over five challenging datasets for two tasks (occluded and holistic Re-ID) demonstrate that our proposed PFD is superior promising, which performs favorably against state-of-the-art methods. Code is available at https://github.com/WangTaoAs/PFD_Net

Authors (5)

Tao Wang (700 papers)
Hong Liu (396 papers)
Pinhao Song (13 papers)
Tianyu Guo (33 papers)
Wei Shi (116 papers)

Citations (142)

View on Semantic Scholar

Summary

The paper introduces a transformer-based Pose-guided Feature Disentangling method that selectively matches non-occluded body parts.
It employs innovative modules like PFA, PVM, and a Pose-guided Push Loss to effectively separate occlusion from useful features, boosting rank-1 and mAP scores.
Extensive experiments on five datasets, including Occluded-Duke and Occluded-REID, validate its superior performance in complex surveillance scenarios.

Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer

Occluded person re-identification (Re-ID) is an intricate challenge within computer vision, primarily due to the occlusion of body parts by objects such as trees, cars, or among crowds, complicating identity matching across non-overlapping camera views. Traditional methods leveraging spatial alignment often suffer from complexity and inefficiency in handling such partial occlusions. The paper under discussion introduces a novel transformer-based approach, termed as Pose-guided Feature Disentangling (PFD), which innovatively employs pose information to disentangle and selectively match non-occluded parts, aiming to enhance occluded person Re-ID.

In their proposed methodology, the authors deploy Vision Transformer (ViT) to leverage its robust patch feature extraction capability. They introduce a Pose-guided Feature Aggregation (PFA) module that initially disentangles the pose information from other features, facilitating a clearer distinction of semantic components such as human body parts. Additionally, a decoder with learnable semantic views ingrains further discriminative learning of body features. To refine the disentangled features and ensure they are body-related, the Pose-View Matching (PVM) module explicitly matches visible parts, thus segregating occlusion features effectively.

An innovative component of the PFD architecture is the Pose-guided Push Loss, designed to emphasize significant features of visible body parts and simultaneously minimize the background noise interference and limitations in feature learning. This multi-faceted Re-ID pipeline is thoroughly validated across five robust datasets, encompassing both occluded and holistic Re-ID tasks, demonstrating its superior performance over existing state-of-the-art approaches. The paper reports remarkable improvements with a higher rank-1 accuracy and mean Average Precision (mAP), particularly on challenging datasets like Occluded-Duke and Occluded-REID, signifying its robust practical applicability.

The implications of this research are notable in surveillance and security sectors where accurate person tracking amidst obstructions is crucial. The PFD's integration of pose information with transformer frameworks offers a promising direction toward handling occluded scenarios in Re-ID effectively. The research could be further expanded by exploring more advanced pose estimation techniques to reduce the impact of pose noise, as observed in the sensitivity analysis conducted by the authors. Additionally, refining the semantic view learning capabilities may yield even more precise feature disentanglement outcomes. Overall, this paper lays a compelling groundwork for advancing occluded person Re-ID with transformer networks, striking a balance between discriminative feature learning and computational efficiency.

PDF Markdown

Related Papers

GitHub

GitHub - WangTaoAs/PFD_Net: [AAAI2022] This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022 (110 stars)