Cross-view Asymmetric Metric Learning for Unsupervised Person Re-identification (1708.08062v2)

Published 27 Aug 2017 in cs.CV

Abstract: While metric learning is important for Person re-identification (RE-ID), a significant problem in visual surveillance for cross-view pedestrian matching, existing metric models for RE-ID are mostly based on supervised learning that requires quantities of labeled samples in all pairs of camera views for training. However, this limits their scalabilities to realistic applications, in which a large amount of data over multiple disjoint camera views is available but not labelled. To overcome the problem, we propose unsupervised asymmetric metric learning for unsupervised RE-ID. Our model aims to learn an asymmetric metric, i.e., specific projection for each view, based on asymmetric clustering on cross-view person images. Our model finds a shared space where view-specific bias is alleviated and thus better matching performance can be achieved. Extensive experiments have been conducted on a baseline and five large-scale RE-ID datasets to demonstrate the effectiveness of the proposed model. Through the comparison, we show that our model works much more suitable for unsupervised RE-ID compared to classical unsupervised metric learning models. We also compare with existing unsupervised RE-ID methods, and our model outperforms them with notable margins. Specifically, we report the results on large-scale unlabelled RE-ID dataset, which is important but unfortunately less concerned in literatures.

Citations (328)

View on Semantic Scholar

Summary

The paper introduces CAMEL, an unsupervised asymmetric metric learning framework that addresses view-specific biases for improved person re-identification.
It employs camera-specific projections and clustering to learn a shared feature space that effectively matches identities across different views.
Experiments on six datasets demonstrate CAMEL's scalability and superior performance over traditional unsupervised approaches.

Analysis of "Cross-view Asymmetric Metric Learning for Unsupervised Person Re-identification"

The paper "Cross-view Asymmetric Metric Learning for Unsupervised Person Re-identification" by Hong-Xing Yu, Ancong Wu, and Wei-Shi Zheng proposes an innovative approach to the unsupervised person re-identification (RE-ID) problem. This issue is significant within visual surveillance systems, attempting to match and rank pedestrians' identities across non-overlapping camera views without labeled training data. The work targets the inherent challenges of drastic intra-class variations and high inter-class similarity in person re-identification tasks.

Key Contributions

The primary contribution of the paper is the introduction of an unsupervised asymmetric metric learning framework termed CAMEL (Clustering-based Asymmetric Metric Learning) for unsupervised RE-ID. Unlike conventional supervised learning methods which require large amounts of labeled cross-view training data, CAMEL operates without labels. It addresses the complexities of existing unsupervised models, which often do not account for or explicitly address view-specific biases causing perturbations in feature representations across camera views.

The theoretical model of CAMEL is underpinned by two core components:

Asymmetric Metric Learning: CAMEL features a view-specific projection for each camera view, intending to find a shared space where biases are minimized, thereby achieving superior cross-view matching performance. This approach is distinct from traditional symmetric models that typically fail to accommodate the unique bias each camera view introduces.
Asymmetric Metric Clustering: The model structures unlabelled RE-ID data through a clustering approach, facilitating data characterization in the learned shared space. This clustering is pivotal, focusing on the effective separation of dissimilar data points when labels guide the clustering process inadequately.

Experimental Results

The authors conducted thorough evaluations across six datasets varying in size from several hundred to several hundred thousand samples, including newly constructed datasets like ExMarket with over 230,000 images. The experiments demonstrate that CAMEL consistently surpasses traditional unsupervised models, particularly on larger-scale datasets—highlighting its scalability—a trait lacking in existing unsupervised models which cannot efficiently handle massive datasets.

Practical and Theoretical Implications

Practically, CAMEL's ability to learn view-specific transformations for different camera views without the need for labeled data has significant implications for scaling RE-ID systems in real-world applications where labeled data is often scarce or expensive to generate. Theoretically, CAMEL opens the potential to explore asymmetric metric learning and clustering mechanisms further, suggesting pathways for future research in unsupervised machine learning algorithms applicable beyond RE-ID tasks.

Future Directions

Looking forward, the model presents new research avenues in unsupervised learning. One potential direction involves refining and possibly integrating deep learning methodologies to further alleviate view-specific interferences and explore semi-supervised settings where minimal labeled data might be available to enhance model performance. Another interesting prospect could be the application of CAMEL's asymmetric clustering strategy to other domains requiring cross-domain or cross-view invariance, such as multi-modal data integration or domain adaptation tasks within computer vision and beyond.

In conclusion, this paper presents a significant leap in unsupervised person re-identification by recognizing and tackling the complex biases induced by cross-view camera setups. By strategically leveraging unsupervised asymmetric metric learning and clustering, it sets a new precedence in handling large, unlabelled RE-ID datasets, paving the way for more adaptable and scalable surveillance systems.

PDF Markdown