Embedding Deep Metric for Person Re-identication A Study Against Large Variations (1611.00137v1)

Published 1 Nov 2016 in cs.CV and cs.LG

Abstract: Person re-identification is challenging due to the large variations of pose, illumination, occlusion and camera view. Owing to these variations, the pedestrian data is distributed as highly-curved manifolds in the feature space, despite the current convolutional neural networks (CNN)'s capability of feature extraction. However, the distribution is unknown, so it is difficult to use the geodesic distance when comparing two samples. In practice, the current deep embedding methods use the Euclidean distance for the training and test. On the other hand, the manifold learning methods suggest to use the Euclidean distance in the local range, combining with the graphical relationship between samples, for approximating the geodesic distance. From this point of view, selecting suitable positive i.e. intra-class) training samples within a local range is critical for training the CNN embedding, especially when the data has large intra-class variations. In this paper, we propose a novel moderate positive sample mining method to train robust CNN for person re-identification, dealing with the problem of large variation. In addition, we improve the learning by a metric weight constraint, so that the learned metric has a better generalization ability. Experiments show that these two strategies are effective in learning robust deep metrics for person re-identification, and accordingly our deep model significantly outperforms the state-of-the-art methods on several benchmarks of person re-identification. Therefore, the study presented in this paper may be useful in inspiring new designs of deep models for person re-identification.

Citations (306)

View on Semantic Scholar

Summary

The paper presents a moderate positive sample mining technique to train CNNs that robustly address large intra-class variations in person re-identification.
It employs Frobenius norm-based regularization on metric layers to balance discriminative power with improved generalization and reduced overfitting.
Extensive experiments on CUHK03, CUHK01, and VIPeR datasets validate the approach with state-of-the-art rank-one identification rates.

Embedding Deep Metric for Person Re-identification: A Study Against Large Variations

The paper "Embedding Deep Metric for Person Re-identification: A Study Against Large Variations" presents a novel approach to address the challenges associated with person re-identification tasks, specifically targeting the issues of large intra-class variations found in pedestrian data due to factors such as pose, lighting, occlusion, and camera angles. This research advances the field by proposing a method for moderate positive sample mining aimed at training convolutional neural networks (CNNs) more robustly for this task.

Key Contributions and Methodological Advances

At the core of the presented work is the moderate positive sample mining technique, which dynamically selects positive training pairs in order to preserve the intrinsic manifold's structure of pedestrian data while reducing intra-class variance. This approach is inspired by manifold learning methodologies, which suggest that using local Euclidean distance and graphical relationships can help approximate geodesic distances. The paper demonstrates that this mining strategy is crucial for learning robust deep embeddings in scenarios characterized by large intra-class variations.

Moreover, the paper incorporates a weight constraint on metric learning layers within the CNN architecture to counteract overfitting—a common issue given the complexity of pedestrian data. By imposing a Frobenius norm-based regularization that encourages the learned metric matrix to approximate an identity matrix, the method balances the discriminability of Mahalanobis distance while enhancing generalization abilities akin to Euclidean distance.

Experimental Evaluation and Results

The methodology was rigorously tested on three prominent person re-identification datasets: CUHK03, CUHK01, and VIPeR, spanning various scales and complexities of data distribution. Through comparative experiments, the proposed method achieved superior performance over existing approaches on the CUHK03 and CUHK01 datasets, achieving state-of-the-art rank-one identification rates. It also showed commendable results on the notoriously challenging VIPeR dataset. Additionally, the evaluation explored the effects of different parameter settings such as the impact of the regularization parameter, which balanced within-class and between-class variances.

Implications and Future Directions

The research contributes significantly to the domain of metric learning for person re-identification, especially in handling the intricate variations presented by real-world surveillance scenarios. The refinement of deep metric networks through moderate positive mining and weight constraints might inspire future designs in both person re-identification systems and more broadly, in varied applications of pattern recognition and computer vision. One promising avenue for future exploration could involve the extension of this method to larger-scale datasets and integration with other emergent technologies, such as attention mechanisms or transformer networks, to further improve efficiency and accuracy.

In conclusion, the paper makes a valuable addition to the growing body of knowledge in computer vision, particularly within the context of person re-identification. The novel training strategies it introduces can potentially lead to the development of more robust neural architectures capable of navigating the expansive and complex feature spaces inherent in pedestrian datasets.

PDF Markdown