Unsupervised Person Re-identification by Deep Learning Tracklet Association (1809.02874v1)

Published 8 Sep 2018 in cs.CV

Abstract: Mostexistingpersonre-identification(re-id)methods relyon supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in practical re-id deployment due to the lack of exhaustive identity labelling of image positive and negative pairs for every camera pair. In this work, we address this problem by proposing an unsupervised re-id deep learning approach capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data from videos in an end-to-end model optimisation. We formulate a Tracklet Association Unsupervised Deep Learning (TAUDL) framework characterised by jointly learning per-camera (within-camera) tracklet association (labelling) and cross-camera tracklet correlation by maximising the discovery of most likely tracklet relationships across camera views. Extensive experiments demonstrate the superiority of the proposed TAUDL model over the state-of-the-art unsupervised and domain adaptation re- id methods using six person re-id benchmarking datasets.

Authors (3)

Minxian Li (11 papers)
Xiatian Zhu (139 papers)
Shaogang Gong (94 papers)

Citations (224)

View on Semantic Scholar

Summary

Analysis of "Unsupervised Person Re-identification by Deep Learning Tracklet Association"

This paper introduces an innovative approach to person re-identification (re-id) by leveraging an unsupervised deep learning framework termed Tracklet Association Unsupervised Deep Learning (TAUDL). The primary contribution of the paper lies in overcoming the limitations prevalent in conventional person re-id methods that heavily rely on manually labeled data. The proposed TAUDL framework operates efficiently without utilizing identity-labeled data, making it a scalable solution for real-world deployment across vast and diverse video surveillance environments.

Methodology and Framework

The authors design a novel methodology for unsupervised tracklet sampling and labeling mechanism known as Sparse Space-Time Tracklet (SSTT). This enables effective label assignment with minimal identity duplication by exploiting temporal and spatial constraints within single-camera views. Consequently, this strategy minimizes the redundancy and error rates typical in traditional tracklet formation approaches, thus presenting a cost-effective and resource-efficient method for unsupervised re-id.

The core of the TAUDL framework is divided into two key learning components:

Per-Camera Tracklet Discrimination Learning (PCTD): This process optimizes feature learning within each camera using a multi-task learning approach. By handling per-camera tasks separately yet sharing a common feature space, the approach capitalizes on intra-view data discrimination. The multi-branch network architecture enables the model to handle diverse camera-specific attribute distributions effectively.
Cross-Camera Tracklet Association Learning (CCTA): Instead of relying on explicit cross-view identity matching, the authors introduce a loss function that aligns feature distributions across camera views. This alignment overcomes the absence of explicit identity labels, leveraging nearest neighbor search to align data structures in the shared feature space. Such alignment facilitates the model's capacity to generalize well across different camera setups without relying on manually synthesized constraints.

Evaluation and Results

Extensive experiments across six well-known benchmark datasets, including CUHK03, Market-1501, DukeMTMC, iLIDS-VID, PRID2011, and MARS, demonstrate the robustness and superiority of the TAUDL framework over existing state-of-the-art unsupervised and domain adaptation models. Notably, the TAUDL approach achieves significant improvements in Rank-1 accuracy and mAP metrics on large-scale image and video-based datasets, emphasizing its practical applicability in diverse scenarios.

The paper further explores the influence of various TAUDL components. PCTD offers robust within-camera feature representation enhancing cross-view discrimination, while the integration of CCTA significantly boosts the framework’s capability to associate data across disparate camera perspectives.

Implications and Future Directions

The implications of the TAUDL framework are substantial for both practical surveillance systems and the theoretical development of unsupervised deep learning models. By eliminating the exhaustive requirement of labeled datasets, this framework aligns with the needs of scalable, real-world applications that deploy in heterogeneous environmental conditions.

Future developments might focus on refining the TAUDL framework to manage more complex scenarios with high degrees of identity overlap and fragmentation in tracklet formation. Additionally, exploring the fusion of TAUDL with real-time systems and edge computing scenarios offers exciting avenues for enhancing and deploying privacy-aware, video-based re-identification solutions.

In conclusion, the paper presents a significant advancement in unsupervised re-id methodologies, advocating a shift towards more autonomous and scalable surveillance systems devoid of manual intervention in labeling and tracking individuality across extensive video networks.

PDF Markdown