An Enhanced Deep Feature Representation for Person Re-identification (1604.07807v2)

Published 26 Apr 2016 in cs.CV

Abstract: Feature representation and metric learning are two critical components in person re-identification models. In this paper, we focus on the feature representation and claim that hand-crafted histogram features can be complementary to Convolutional Neural Network (CNN) features. We propose a novel feature extraction model called Feature Fusion Net (FFN) for pedestrian image representation. In FFN, back propagation makes CNN features constrained by the handcrafted features. Utilizing color histogram features (RGB, HSV, YCbCr, Lab and YIQ) and texture features (multi-scale and multi-orientation Gabor features), we get a new deep feature representation that is more discriminative and compact. Experiments on three challenging datasets (VIPeR, CUHK01, PRID450s) validates the effectiveness of our proposal.

Citations (268)

View on Semantic Scholar

Summary

The paper presents Feature Fusion Net (FFN), a novel model that enhances person re-identification by fusing both deep CNN and hand-crafted features to improve discriminative capability.
FFN employs a dual-branch network architecture, processing CNN and hand-crafted features separately before combining them in a fusion layer to learn a unified representation.
Experiments demonstrate that FFN achieves significant improvements in Rank-1 matching rates on public datasets like VIPeR, CUHK01, and PRID450s by effectively combining complementary feature types.

An Enhanced Deep Feature Representation for Person Re-identification

The paper presents a novel feature extraction model called Feature Fusion Net (FFN) for person re-identification, combining both convolutional neural network (CNN) features and hand-crafted features to improve discriminative capability. The integration of RGB, HSV, YCbCr, Lab, YIQ color space features, and multi-scale, multi-orientation Gabor texture features results in more robust and compact deep feature representation.

Person re-identification is a challenging problem due to the substantial variations in appearance caused by changes in view angle, lighting conditions, background clutter, and occlusion across different camera views. Existing methods often focus either on features that remain invariant across views or on robust metrics to handle these challenges. While CNNs have shown great potential in handling such vision tasks by adapting through backpropagation, the hand-crafted features remain critical as they are specifically designed to manage large variations in appearance.

FFN employs a dual-branch network design where one branch processes the image through standard convolutional, pooling, and activation layers, while the other branch processes hand-crafted features, specifically designed for person re-identification. The outputs from these branches are fused in a fusion layer, allowing the network to learn a unified feature representation constrained and regularized by the hand-crafted features. This fusion process enables FFN to not only retain but enhance the discriminative nature of these complementary features.

The paper reports the efficacy of using FFN in experiments on three publicly available datasets: VIPeR, CUHK01, and PRID450s. The results indicate that the FFN achieves significant improvements in Rank-1 matching rates of 8.09%, 7.98%, and 11.2% over state-of-the-art methods on these datasets, respectively. This improvement is attributed to the ability of FFN to extract regularized CNN features that are complementary to the utilized hand-crafted features. The combination of these features within the FFN framework is demonstrated to outperform the straightforward concatenation approaches such as ELF16+CNN-FC7, suggesting that the network benefits substantially from the integration, rather than simple combination, of various feature types.

The proposal of FFN demonstrates the importance of combining deep learning with traditional handcrafted approaches, providing insights into the potential for optimizing CNN based models using hand-crafted feature guidance. The architecture, which does not require the formation of pairwise input or specific re-training for each target dataset, also offers practical advantages in terms of time efficiency and adaptability.

In summary, the FFN addresses critical issues in person re-identification by leveraging hand-crafted features to enhance deep feature representations. This work sets a precedent for exploring further integration of neural networks with domain-specific features. Future research could investigate refining the FFN architecture to utilize more sophisticated hand-crafted features or extend its application to other domains beyond person re-identification in surveillance scenarios.

PDF Markdown

An Enhanced Deep Feature Representation for Person Re-identification (1604.07807v2)

Summary

An Enhanced Deep Feature Representation for Person Re-identification

Related Papers