Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline) (1711.09349v3)

Published 26 Nov 2017 in cs.CV

Abstract: Employing part-level features for pedestrian image description offers fine-grained information and has been verified as beneficial for person retrieval in very recent literature. A prerequisite of part discovery is that each part should be well located. Instead of using external cues, e.g., pose estimation, to directly locate parts, this paper lays emphasis on the content consistency within each part. Specifically, we target at learning discriminative part-informed features for person retrieval and make two contributions. (i) A network named Part-based Convolutional Baseline (PCB). Given an image input, it outputs a convolutional descriptor consisting of several part-level features. With a uniform partition strategy, PCB achieves competitive results with the state-of-the-art methods, proving itself as a strong convolutional baseline for person retrieval. (ii) A refined part pooling (RPP) method. Uniform partition inevitably incurs outliers in each part, which are in fact more similar to other parts. RPP re-assigns these outliers to the parts they are closest to, resulting in refined parts with enhanced within-part consistency. Experiment confirms that RPP allows PCB to gain another round of performance boost. For instance, on the Market-1501 dataset, we achieve (77.4+4.2)% mAP and (92.3+1.5)% rank-1 accuracy, surpassing the state of the art by a large margin.

Citations (2,077)

View on Semantic Scholar

Summary

The paper introduces a Part-based Convolutional Baseline (PCB) and a Refined Part Pooling (RPP) method to enhance feature consistency in person retrieval.
It employs a uniform partition strategy on convolutional features and refines part assignments dynamically without extra part labels.
Experimental results on benchmarks like Market-1501, DukeMTMC-reID, and CUHK03 show state-of-the-art performance with rank-1 accuracy up to 93.8%.

Overview of "Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)"

The paper "Beyond Part Models: Person Retrieval with Refined Part Pooling (and A Strong Convolutional Baseline)" by Yifan Sun et al. addresses the critical problem in person retrieval, specifically focusing on enhancing the discriminative power of part-informed features. This research makes substantial contributions by introducing a Part-based Convolutional Baseline (PCB) and a novel Refined Part Pooling (RPP) method, achieving state-of-the-art results on several benchmark datasets.

Contributions

The primary contributions of this work are twofold:

Part-based Convolutional Baseline (PCB): The authors propose PCB, which employs a uniform partition strategy on the convoluted features of input images. By generating part-level features and assembling them into a single convolutional descriptor, PCB sets a new strong baseline for person retrieval tasks.
Refined Part Pooling (RPP): The RPP method rectifies uniform partition by addressing within-part inconsistency. Instead of relying solely on predefined partitions, RPP dynamically reassigns inconsistent features to more appropriate parts based on their content, thereby enhancing the consistency within each part.

Methodology

Part-based Convolutional Baseline (PCB)

The PCB architecture builds upon a standard convolutional neural network, modifying the final layers to generate part-level features. The image features are uniformly divided into several horizontal stripes, and each stripe is pooled to create part-informed descriptors. These descriptors are then transformed into a discriminative feature vector through final convolutional and fully connected layers. Notably, PCB does not require external cues such as human pose estimation, making it robust and straightforward.

Refined Part Pooling (RPP)

RPP improves the uniform partition by addressing the outliers within each part. After establishing a standard PCB model through uniform partition, RPP employs a part classifier to refine the initial parts. This classifier, trained through an induced method, dynamically reallocates outlier features into more appropriate parts, thus reinforcing the internal consistency of each part. This refinement process is introduced without needing additional part labels, making the model more adaptive and accurate.

Experimental Results

PCB, combined with RPP, demonstrates significant performance improvements on three benchmark datasets: Market-1501, DukeMTMC-reID, and CUHK03. On Market-1501, PCB coupled with RPP achieves 93.8% rank-1 accuracy and 81.6% mAP, substantially surpassing previous state-of-the-art results. Similar improvements are observed on DukeMTMC-reID and CUHK03 datasets, confirming the model's effectiveness across different settings.

Practical and Theoretical Implications

The implications of this research are notable both in practice and theory.

Practical Implications: Practically, the proposed PCB and RPP methods can be employed in real-world surveillance systems, enhancing the accuracy and reliability of person identification systems in crowded, dynamic environments.
Theoretical Implications: Theoretically, this work offers a new perspective on part-based models by shifting the focus from external part cues to the internal consistency of feature representations. This shift could influence further research aiming to improve part-based learning in other computer vision tasks.

Future Developments

Future research can build upon this work by exploring several avenues:

Integration with Other Models: Combining PCB and RPP with other state-of-the-art models could yield even higher accuracy and robustness in diverse conditions.
Real-Time Applications: Optimizing the computational efficiency of PCB and RPP to enable real-time person retrieval in dynamic scenes.
Extension to Other Domains: Applying the principles of part consistency and refined pooling to other domains such as object detection and facial recognition.

In conclusion, the introduction of PCB and RPP represents a significant step forward in the field of person retrieval. The demonstrated performance improvements and theoretical insights provided by this paper are likely to inspire subsequent advancements and applications in related areas of computer vision.

PDF Markdown