Large-Scale Pre-training for Person Re-identification with Noisy Labels

Published 30 Mar 2022 in cs.CV | (2203.16533v2)

Abstract: This paper aims to address the problem of pre-training for person re-identification (Re-ID) with noisy labels. To setup the pre-training task, we apply a simple online multi-object tracking system on raw videos of an existing unlabeled Re-ID dataset "LUPerson" nd build the Noisy Labeled variant called "LUPerson-NL". Since theses ID labels automatically derived from tracklets inevitably contain noises, we develop a large-scale Pre-training framework utilizing Noisy Labels (PNL), which consists of three learning modules: supervised Re-ID learning, prototype-based contrastive learning, and label-guided contrastive learning. In principle, joint learning of these three modules not only clusters similar examples to one prototype, but also rectifies noisy labels based on the prototype assignment. We demonstrate that learning directly from raw videos is a promising alternative for pre-training, which utilizes spatial and temporal correlations as weak supervision. This simple pre-training task provides a scalable way to learn SOTA Re-ID representations from scratch on "LUPerson-NL" without bells and whistles. For example, by applying on the same supervised Re-ID method MGN, our pre-trained model improves the mAP over the unsupervised pre-training counterpart by 5.7%, 2.2%, 2.3% on CUHK03, DukeMTMC, and MSMT17 respectively. Under the small-scale or few-shot setting, the performance gain is even more significant, suggesting a better transferability of the learned representation. Code is available at https://github.com/DengpanFu/LUPerson-NL

Abstract PDF Upgrade to Chat

Citations (51)

View on Semantic Scholar

Summary

The paper introduces a novel PNL framework that combines supervised, prototype-based, and label-guided contrastive learning to mitigate noise in person re-identification.
It constructs the LUPerson-NL dataset featuring 10 million images and 430,000 identities with automatically generated labels from raw video data.
Experimental results demonstrate significant improvements in mAP and transferability on standard benchmarks, underscoring the method's scalability and practical impact.

Large-Scale Pre-training for Person Re-identification with Noisy Labels

The paper "Large-Scale Pre-training for Person Re-identification with Noisy Labels" presents a study on leveraging large-scale pre-training methodologies to address the person re-identification (Re-ID) task, specifically focusing on the utilization of noisy labels. Person Re-ID is a challenging task requiring model learning to distinguish between instances in visual data—particularly useful in surveillance, security, and search applications.

Dataset Construction

The study introduces a novel dataset named "LUPerson-NL", which is a noisy labeled variant of the existing "LUPerson" dataset. This new dataset contains approximately 10 million images with around 430,000 identities, encapsulating spatial and temporal correlations derived from video frames. The labels in LUPerson-NL are generated using an online multi-object tracking system that processes raw video data, automatically assigning labels to tracklets without human intervention. The dataset offers considerable scale compared to existing labeled datasets, offering a promising foundation for large-scale pre-training.

Pre-training Framework

The presented pre-training framework, PNL (Pre-training with Noisy Labels), employs a tripartite learning structure:

Supervised Re-ID Learning: This module leverages classification learning using available labels, adjusting for the noise inherent in automatic label assignment.
Prototype-based Contrastive Learning: This segment facilitates the clustering of instances into prototypes, serving dual purposes—improving the embedding space and rectifying label noise by reassigning examples based on dynamic prototype adjustments.
Label-guided Contrastive Learning: Emphasizing the rectified labels, this part of the framework refines proxy labels, optimizing sample distinction by using definite prototype assignments as guidance for contrastive pairing.

Integrating these components enables the framework to exploit weak supervision from spatial-temporal video correlations effectively.

Experimental Results

The effectiveness of the proposed methodologies is established through comprehensive experiments conducted across popular Re-ID benchmarks, namely CUHK03, Market1501, DukeMTMC, and MSMT17. The experimental evaluations observe notably enhanced performance metrics, with the pre-trained models increasing mean Average Precision (mAP) by substantial margins compared to unsupervised counterparts and models pre-trained on ImageNet, especially under settings with limited supervision.

Further analyses illuminate the superior transferability of PNL's pre-trained representations, particularly under small-scale or few-shot settings, showcasing advanced domain adaptation capabilities. The paper argues that noisy label facilitated pre-training provides a scalable and significantly beneficial asset to person Re-ID tasks.

Implications and Future Directions

The implications of the study are profound in reinforcing the value of large-scale pre-training, pointing to a potential shift towards more reliance on automated noisy label assignments in Re-ID applications. From a practical standpoint, applications in video surveillance, public security, and autonomous systems stand to benefit tremendously from improved Re-ID representation learning.

Future possibilities suggested by the study include refining the tracking algorithms to minimize label noise and to expand this methodology across further vision tasks, potentially generalizing the approach beyond person-specific embeddings. The continued integration of temporal and spatial data may also offer new avenues in refining automated systems, pushing boundaries in machine learning representations.

In conclusion, the paper contributes significant insights into leveraging noisy label datasets at scale, reinforcing large-scale learning's transformative impact on the Re-ID task.

Markdown