Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer (2106.04095v1)

Published 8 Jun 2021 in cs.CV

Abstract: Occluded person re-identification (Re-ID) is a challenging task as persons are frequently occluded by various obstacles or other persons, especially in the crowd scenario. To address these issues, we propose a novel end-to-end Part-Aware Transformer (PAT) for occluded person Re-ID through diverse part discovery via a transformer encoderdecoder architecture, including a pixel context based transformer encoder and a part prototype based transformer decoder. The proposed PAT model enjoys several merits. First, to the best of our knowledge, this is the first work to exploit the transformer encoder-decoder architecture for occluded person Re-ID in a unified deep model. Second, to learn part prototypes well with only identity labels, we design two effective mechanisms including part diversity and part discriminability. Consequently, we can achieve diverse part discovery for occluded person Re-ID in a weakly supervised manner. Extensive experimental results on six challenging benchmarks for three tasks (occluded, partial and holistic Re-ID) demonstrate that our proposed PAT performs favorably against stat-of-the-art methods.

Citations (266)

View on Semantic Scholar

Summary

The paper introduces a Part-Aware Transformer that leverages learnable part prototypes to identify discriminative human body parts even under occlusion.
It employs a pixel context-based encoder with self-attention to effectively filter out background noise and enhance feature robustness.
PAT achieves superior performance on occluded, partial, and holistic Re-ID benchmarks, setting new standards for robust person re-identification.

Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer

The paper introduces a novel method for addressing the challenging problem of occluded person re-identification (Re-ID) using a Part-Aware Transformer (PAT). This approach is designed to tackle the issue of person occlusion, which typically occurs in crowded environments where individuals are partially obstructed by other objects or people. The PAT model is presented as a unified deep learning framework utilizing a transformer encoder-decoder architecture to facilitate the discovery of diverse human parts.

A key innovation in this paper is leveraging the transformer architecture to process and learn representations for occluded person Re-ID, a departure from many traditional methods which either rely on rigid handcrafted feature splits or depend heavily on auxiliary pose estimation or human parsing models. The proposed architecture consists of two main components: a pixel context-based transformer encoder and a part prototype-based transformer decoder. The encoder incorporates a self-attention mechanism to capture global pixel context and improve robustness against background noise. This is crucial for focusing on the pertinent features of the occluded subjects while minimizing the influence of irrelevant image data.

The decoder operates with a novel component termed "part prototypes," which are used as learnable queries for discovering discriminative parts of a person’s body, even in partially occluded images. These part prototypes generate part-aware masks, serving as attention mechanisms that can highlight different body regions, thereby enabling the model to assemble representations that are both robust and informative.

To optimize the learning of these part prototypes using identity labels alone, the paper introduces two mechanisms: part diversity and part discriminability. Part diversity encourages the model to differentiate across multiple body parts, ensuring that various prototypes focus on different human regions. In contrast, part discriminability ensures that these learned parts are effective for distinguishing between different identities, employing identity classification and triplet loss as supervisory signals.

The PAT’s performance is thoroughly validated on multiple Re-ID tasks including occluded, partial, and holistic Re-ID scenarios. The results demonstrate that the PAT outperforms existing state-of-the-art methods on several benchmarks, evidencing its ability to handle occluded scenarios more effectively than previous models reliant on external modules such as pose estimators. For example, on the Occluded-Duke dataset, the PAT achieves a noticeable improvement in rank-1 and mAP metrics compared to existing solutions.

The practical implications of this method are significant in real-world scenarios where occlusion is a common problem, such as surveillance and security applications. Theoretically, the introduction of transformers into the Re-ID domain encourages a new line of research exploring self-attention mechanisms for complex vision tasks.

Future research directions may explore extending this framework to dynamically adapt to different degrees of occlusion and investigate how integrating temporal data from video sequences could further enhance Re-ID performance. Additionally, exploring ways to reduce computational overhead without compromising accuracy will be crucial for deploying such models in real-time systems.

This paper provides a substantial contribution to the field of occluded person Re-ID by introducing a sophisticated transformer-based approach capable of discovering and leveraging diverse body parts under occlusion circumstances, thus setting a new benchmark for further developments in robust person identification systems.

PDF Markdown

Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer (2106.04095v1)

Summary

Diverse Part Discovery: Occluded Person Re-identification with Part-Aware Transformer

Related Papers