Person Re-Identification by Deep Joint Learning of Multi-Loss Classification (1705.04724v2)

Published 12 May 2017 in cs.CV and cs.AI

Abstract: Existing person re-identification (re-id) methods rely mostly on either localised or global feature representation alone. This ignores their joint benefit and mutual complementary effects. In this work, we show the advantages of jointly learning local and global features in a Convolutional Neural Network (CNN) by aiming to discover correlated local and global features in different context. Specifically, we formulate a method for joint learning of local and global feature selection losses designed to optimise person re-id when using only generic matching metrics such as the L2 distance. We design a novel CNN architecture for Jointly Learning Multi-Loss (JLML) of local and global discriminative feature optimisation subject concurrently to the same re-id labelled information. Extensive comparative evaluations demonstrate the advantages of this new JLML model for person re-id over a wide range of state-of-the-art re-id methods on five benchmarks (VIPeR, GRID, CUHK01, CUHK03, Market-1501).

Citations (412)

View on Semantic Scholar

Summary

The paper introduces a dual-branch CNN integrating global and local feature learning with multi-loss classification for effective person re-identification.
The model employs structured sparsity regularisation and a custom ResNet39 design to balance feature learning with computational efficiency.
Experimental results on five benchmark datasets, including Market-1501, show significant improvements in Rank-1 accuracy.

Overview of "Person Re-Identification by Deep Joint Learning of Multi-Loss Classification"

This paper introduces a novel approach for person re-identification (re-id) that synergistically integrates local and global feature learning within a unified Convolutional Neural Network (CNN) framework, particularly focusing on a multi-loss classification model. The central thesis rests on the assertion that existing re-id frameworks typically focus exclusively on either local or global feature representations, which proves suboptimal for achieving a comprehensive identity matching across non-overlapping camera views.

Methodological Contributions

The presented model, termed Joint Learning of Multi-Loss (JLML) CNN, embodies a dual-branch architecture that permits simultaneous optimisation of local and global feature representations, addressing the constraints associated with conventional single-level feature emphasis. The primary attributes of this methodology include:

Two-Branch Architecture: The model encompasses a dual-branch CNN where one branch is dedicated to global feature learning across entire bounding box pedestrian images, while the other is tasked with local feature extraction through body-part segmentation.
Multi-Loss Objective: Each branch independently optimises feature representation subjected to the same person identity class labels, employing a cross-entropy classification loss. This framework avoids complex, pair-specific metric learning techniques and allows for generic matching metrics like L2 distance.
Structured Sparsity Regularisation: To enhance feature learning robustness, the model incorporates structured sparsity regularisation, fostering a feature selection mechanism that is evidently beneficial against data noise and covariance.
Efficient Network Design: A customised ResNet39 model, balancing depth and complexity, is introduced to facilitate the simultaneous learning task, realised through both shared and independent CNN layers, ensuring effective feature capturing with competitive computational efficiency.

Experimental Outcomes

Evaluations across five benchmark datasets, including VIPeR, GRID, CUHK01, CUHK03, and Market-1501, underscore the performance superiority of the proposed JLML model over several state-of-the-art approaches, ranging from traditional hand-crafted to deep learning-based re-id solutions. Notably, the model demonstrates robust improvements in Rank-1 identification rates, with pronounced gains observed particularly in larger datasets like Market-1501. The analysis indicates that the combined feature representation effectively utilises both local and global contextual information, mitigating challenges such as misalignment and occlusion in person re-id tasks.

Implications and Future Directions

The implications of this research are multifaceted, proposing a paradigm shift from traditional single-feature-scope models to a unified framework that leverages the strengths inherent in both local detail and global context. The potential extensions of this work might involve exploring adaptive learning techniques to dynamically adjust the weighting between local and global features based on specific environmental contexts or disparate dataset characteristics. Furthermore, integration with unsupervised or semi-supervised learning frameworks could extend the applicability of JLML in scenarios where labelled data is scarce.

In summary, this paper presents a significant methodological advancement in the field of person re-identification, offering a comprehensive approach to feature learning that promises to be adaptable and scalable for various real-world surveillance and biometric applications.